# **Jan Friso Groote Kim Guldstrand Larsen (Eds.)**

# **Tools and Algorithms for the Construction and Analysis of Systems**

**27th International Conference, TACAS 2021 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021 Luxembourg City, Luxembourg, March 27 – April 1, 2021 Proceedings, Part I**

# Lecture Notes in Computer Science 12651

Founding Editors

Gerhard Goos, Germany Juris Hartmanis, USA

### Editorial Board Members

Elisa Bertino, USA Wen Gao, China Bernhard Steffen , Germany Gerhard Woeginger , Germany Moti Yung, USA

# Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this subseries at http://www.springer.com/series/7407

# Tools and Algorithms for the Construction and Analysis of Systems

27th International Conference, TACAS 2021 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021 Luxembourg City, Luxembourg, March 27 – April 1, 2021 Proceedings, Part I

Editors Jan Friso Groote Eindhoven University of Technology Eindhoven, The Netherlands

Kim Guldstrand Larsen Aalborg University Aalborg East, Denmark

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-72015-5 ISBN 978-3-030-72016-2 (eBook) https://doi.org/10.1007/978-3-030-72016-2

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2021. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# ETAPS Foreword

Welcome to the 24th ETAPS! ETAPS 2021 was originally planned to take place in Luxembourg in its beautiful capital Luxembourg City. Because of the Covid-19 pandemic, this was changed to an online event.

ETAPS 2021 was the 24th instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming languages, analysis tools, and formal approaches to software engineering. Organising these conferences in a coherent, highly synchronised conference programme enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops take place that attract many researchers from all over the globe.

ETAPS 2021 received 260 submissions in total, 115 of which were accepted, yielding an overall acceptance rate of 44.2%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2021 featured the unifying invited speakers Scott Smolka (Stony Brook University) and Jane Hillston (University of Edinburgh) and the conference-specific invited speakers Işil Dillig (University of Texas at Austin) for ESOP and Willem Visser (Stellenbosch University) for FASE. Inivited tutorials were provided by Erika Ábrahám (RWTH Aachen University) on analysis of hybrid systems and Madhusudan Parthasararathy (University of Illinois at Urbana-Champaign) on combining machine learning and formal methods.

ETAPS 2021 was originally supposed to take place in Luxembourg City, Luxembourg organized by the SnT - Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg. University of Luxembourg was founded in 2003. The university is one of the best and most international young universities with 6,700 students from 129 countries and 1,331 academics from all over the globe. The local organisation team consisted of Peter Y.A. Ryan (general chair), Peter B. Roenne (organisation chair), Joaquin Garcia-Alfaro (workshop chair), Magali Martin (event manager), David Mestel (publicity chair), and Alfredo Rial (local proceedings chair).

ETAPS 2021 was further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology).

The ETAPS Steering Committee consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (Twente, chair), Jan Kofron (Prague), Barbara König (Duisburg), Gerald Lüttgen (Bamberg), Caterina Urban (INRIA), Tarmo Uustalu (Reykjavik and Tallinn), and Lenore Zuck (Chicago).

Other members of the steering committee are: Patricia Bouyer (Paris), Einar Broch Johnsen (Oslo), Dana Fisman (Be'er Sheva), Jan Friso Groote (Eindhoven), Esther Guerra (Madrid), Reiko Heckel (Leicester), Joost-Pieter Katoen (Aachen and Twente), Stefan Kiefer (Oxford), Fabrice Kordon (Paris), Jan Křetínský (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Andrew M. Pitts (Cambridge), Grigore Roșu (Illinois), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), Lutz Schröder (Erlangen), Ilya Sergey (Singapore), Mariëlle Stoelinga (Twente), Gabriele Taentzer (Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), Anton Wijs (Eindhoven), Manuel Wimmer (Linz), and Nobuko Yoshida (London).

I'd like to take this opportunity to thank all the authors, attendees, organizers of the satellite workshops, and Springer-Verlag GmbH for their support. I hope you all enjoyed ETAPS 2021.

Finally, a big thanks to Peter, Peter, Magali and their local organisation team for all their enormous efforts to make ETAPS a fantastic online event. I hope there will be a next opportunity to host ETAPS in Luxembourg.

February 2021 Marieke Huisman ETAPS SC Chair ETAPS e.V. President

# Preface

TACAS 2021 was the 27th edition of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems conference series. TACAS 2021 was part of the 24th European Joint Conferences on Theory and Practice of Software (ETAPS 2021), which although originally planned to take place in Luxembourg City, was held as an online event on March 27 to April 1 due the the COVID-19 pandemic.

TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, flexibility, and efficiency of tools and algorithms for building computer-controlled systems. There were four types of submissions for TACAS:


This year 141 papers were submitted to TACAS, consisting of 90 research papers, 29 regular tool papers, 16 tool demo papers, and 6 case study papers. Authors were allowed to submit up to four papers. Each paper was reviewed by three Program Committee (PC) members, who made extensive use of subreviewers.

Similarly to previous years, it was possible to submit an artifact alongside a paper, which was mandatory for regular tool and tool demo papers. An artifact might consist of a tool, models, proofs, or other data required for validation of the results of the paper. The Artifact Evaluation Committee (AEC) was tasked with reviewing the artifacts, based on their documentation, ease of use, and, most importantly, whether the results presented in the corresponding paper could be accurately reproduced. Most of the evaluation was carried out using a standardised virtual machine to ensure consistency of the results, except for those artifacts that had special hardware requirements.

The evaluation consisted of two rounds. The first round was carried out in parallel with the work of the PC. The judgment of the AEC was communicated to the PC and weighed in their discussion. The second round took place after paper acceptance notifications were sent out; authors of accepted research papers who did not submit an artifact in the first round could submit their artifact here. In total, 72 artifacts were submitted (63 in the first round and 9 in the second), of which 57 were accepted and 15 rejected. This corresponds to an acceptance rate of 79 percent. Papers with an accepted artifact include a badge on the first page.

Selected authors were requested to provide a rebuttal for both papers and artifacts in case a review gave rise to questions. In total 166 rebuttals were provided. Using the review reports and rebuttals the Programme and the Artifact Evaluation Committees extensively discussed the papers and artifacts and ultimately decided to accept 32 research papers, 7 tool papers, 6 tool demos, and 2 case studies.

Besides the regular conference papers, this two-volume proceedings also contains 8 short papers that describe the participating verification systems and a competition report presenting the results of the 10th SV-COMP, the competition on automatic software verifiers for C and Java programs. These papers were reviewed by a separate program committee (PC); each of the papers was assessed by at least three reviewers. A total of 30 verification systems with developers from 11 countries entered the systematic comparative evaluation, including four submissions from industry. Two sessions in the TACAS program were reserved for the presentation of the results: (1) a summary by the competition chair and of the participating tools by the developer teams in the first session, and (2) an open community meeting in the second session.

March/April 2021 Jan Friso Groote Kim Guldstrand Larsen Frédéric Lang Thierry Lecomte Thomas Neele Peter Gjøl Jensen Dirk Beyer Alfredo Rial

# Organization

### Program Committee (TACAS)

Goran Frehse ENSTA Paris, France Kim Guldstrand Larsen (Chair) Mieke Massink CNR-ISTI, Italy Radu Mateescu Inria, France

Christel Baier TU Dresden, Germany Dirk Beyer LMU Munich, Germany Armin Biere Johannes Kepler University Linz, Austria Valentina Castiglioni Reykjavik University, Iceland Alessandro Cimatti Fondazione Bruno Kessler, Italy Rance Cleaveland University of Maryland, USA Pedro R. D'Argenio Universidad Nacional de Córdoba - CONICET, Argentina Yuxin Deng East China Normal University, China Carla Ferreira Universidade NOVA de Lisboa, Portugal Susanne Graf Université Grenoble Alpes/CNRS/VERIMAG, France Jan Friso Groote (Chair) Eindhoven University of Technology, Netherlands Orna Grumberg Technion - Israel Institute of Technology, Israel Aalborg University, Denmark Klaus Havelund Jet Propulsion Laboratory, USA Holger Hermanns Saarland University, Germany Peter Höfner Australian National University, Australia Hossein Hojjat Rochester Institute of Technology, USA Falk Howar TU Dortmund, Germany David N. Jansen Institute of Software, Chinese Academy of Sciences, China Marcin Jurdziński The University of Warwick, Great Britain Joost-Pieter Katoen RWTH Aachen/Universiteit Twente, Germany/Netherlands Jeroen J. A. Keiren Eindhoven University of Technology, Netherlands Sophia Knight University of Minnesota, USA Laura Kovács Vienna University of Technology, Austria Jan Křetínský Technical University of Munich, Germany Alfons Laarman Leiden University, Netherlands Frédéric Lang Inria Grenoble - Rhône-Alpes/CONVECS, France Thierry Lecomte ClearSy Systems Engineering, France Xinxin Liu Institute of Software, Chinese Academy of Sciences, China Jun Pang University of Luxembourg, Luxembourg


### Artifact Evaluation Committee – AEC

Elvio Gilberto Amparore University of Turin, Italy Jesús Mauricio Chimento KTH, Sweden Hans-Dieter Hiep CWI, Netherlands Mitja Kulczynski Kiel University, Germany Etienne Renault LRDE, France Yoni Zohar Stanford University, USA

Haniel Barbosa Universidade Federal de Minas Gerais, France František Blahoudek University of Texas at Austin, USA Olav Bunte Eindhoven University of Technology, Netherlands Damien Busatto-Gaston Université Libre de Bruxelles, Belgium Nathalie Cauchi University of Oxford, Great Britain Joshua Dawes University of Luxembourg, Luxembourg Mathias Fleury Johannes Kepler University Linz, Austria Daniel J. Fremont University of California, Santa Cruz, USA Manuel Gieseking University of Oldenburg, Germany Peter Gjøl Jensen (Chair) Aalborg University, Denmark Kush Grover Technical University of Munich, Germany Daniela Kaufmann Johannes Kepler University Linz, Austria Alfons Laarman Leiden University, Netherlands Luca Laurenti University of Oxford, Great Britain Maurice Laveaux Eindhoven University of Technology, Netherlands Yong Li Institute of Software, Chinese Academy of Sciences, China Debasmita Lohar Max Planck Institute for Software Systems, Germany Viktor Malík Brno University of Technology, Czech Republic Joshua Moerman RWTH Aachen University, Germany Stefanie Mohr Technische Universität München, Germany Marco Muñiz Aalborg University, Denmark Thomas Neele (Chair) Royal Holloway University of London, Great Britain Wytse Oortwijn University of Twente, Netherlands Elizabeth Polgreen University of Edinburgh, Great Britain José Proenca CISTER-ISEP and HASLab-INESC TEC, Portugal Alceste Scalas Technical University of Denmark, Denmark Morten Konggaard Schou Aalborg University, Denmark Veronika Šoková Brno University of Technology, Czech Republic

### Program Committee and Jury – SV-COMP


### Steering Committee

Dirk Beyer LMU Munich, Germany Rance Cleaveland University of Maryland, USA Holger Hermanns Saarland University, Germany


### Additional Reviewers

Abate, Carmine Achilleos, Antonis Akshay, S. Andriushchenko, Roman André, Étienne Asadi, Sepideh Ashok, Pranav Azeem, Muqsit Bannister, Callum Barnett, Lee Basile, Davide Batz, Kevin Baumgartner, Peter Becchi, Anna ter Beek, Maurice H. Bendík, Jaroslav Bensalem, Saddek van der Berg, Freark Berg, Jeremias Berger, Philipp Bernardo, Marco Biewer, Sebastian Bischopink, Christopher Blicha, Martin Bønneland, Frederik M. Bouvier, Pierre Bozzano, Marco Brellmann, David Broccia, Giovanna Budde, Carlos E. Bursuc, Sergiu Cassel, Sofia Castro, Pablo Chalupa, Marek Chen, Mingshuai Chiang, James Ciancia, Vincenzo Ciesielski, Maciej

Clement, Bradley Coenen, Norine Cubuktepe, Murat Degiovanni, Renzo Demasi, Ramiro Dierl, Simon Dixon, Alex van Dijk, Tom Donatelli, Susanna Dongol, Brijesh Edera, Alejandro Eisentraut, Julia Emmi, Michael Evangelidis, Alexandros Fedotov, Alexander Fedyukovich, Grigory Fehnker, Ansgar Feng, Weizhi Ferreira, Francisco Fleury, Mathias Freiberger, Felix Frenkel, Hadar Friedberger, Karlheinz Fränzle, Martin Funke, Florian Gallá, Francesco Garavel, Hubert Geatti, Luca Gengelbach, Arve Goodloe, Alwyn Goorden, Martijn Goudsmid, Ohad Griggio, Alberto Groce, Alex Grover, Kush Hafidi, Yousra Hallé, Sylvain Hecking-Harbusch, Jesko Heizmann, Matthias Holzner, Stephan Holík, Lukáš Hyvärinen, Antti Irfan, Ahmed Javed, Omar Jensen, Mathias Claus Jonas, Martin Junges, Sebastian Käfer, Nikolai Kanav, Sudeep Kapus, Timotej Kauffman, Sean Khamespanah, Ehsan Kheireddine, Anissa Kiviriga, Andrej Klauck, Michaela Kobayashi, Naoki Köhl, Maximilian Alexander Kozachinskiy, Alexander Kutsia, Temur Lahkim Bennani, Ismail Lammich, Peter Lang, Frédéric Lanotte, Ruggero Latella, Diego Laurenti, Luca Ledent, Philippe Lehtinen, Karoliina Lemberger, Thomas Li, Jianlin Li, Qin Li, Xie Li, Xin Lin, Shaokai Lion, Benjamin Liu, Jiaxiang Liu, Wanwei Loreti, Michele Magnago, Enrico Major, Juraj Marché, Claude Mariegaard, Anders Marsso, Lina Mauritz, Malte McClurg, Jedidiah

Meggendorfer, Tobias Metzger, Niklas Meyer, Roland Micheli, Andrea Mittelmann, Munyque Mizera, Andrzej Moerman, Joshua Mohr, Stefanie Mora, Federico Mover, Sergio Mues, Malte Muller, Lucie Muroor-Nadumane, Ajay Möhle, Sibylle Neele, Thomas Noll, Thomas Norman, Gethin Otoni, Rodrigo Parys, Paweł Pattinson, Dirk Pavela, Jiří Pena, Lucas Pinault, Laureline Piribauer, Jakob Pirogov, Anton Pommellet, Adrien Quatmann, Tim Rappoport, Omer Raskin, Jean-François Rothenberg, Bat-Chen Rouquette, Nicolas Rümmer, Philipp S., Krishna Šafránek, David Sankaranarayanan, Sriram Schallau, Till Schupp, Stefan Serwe, Wendelin Shafiei, Nastaran Shi, Xiaomu Síč, Juraj Sickert, Salomon Singh, Gagandeep Slivovsky, Friedrich Sølvsten, Steffan Song, Fu

Spel, Jip Srivathsan, B. Stankovic, Miroslav Stock, Gregory Strej ček, Jan Su, Cui Suda, Martin Sun, Jun Svozil, Alexander Tian, Chun Tibo, Alessandro Tini, Simone Tonetta, Stefano Trt ík, Marek Turrini, Andrea

Vandin, Andrea Weber, Tjark Weininger, Maximilian Wendler, Philipp Wolf, Karsten Wolovick, Nicol á s Wu, Zhilin Xu, Ming Yang, Pengfei Yang, Xiaoxiao Zhan, Naijun Zhang, Min Zhang, Wenbo Zhang, Wenhui Zhao, Hengjun

# Contents – Part I

#### Game Theory


Roman Andriushchenko, Milan Češka, Sebastian Junges, and Joost-Pieter Katoen




# Contents – Part II

#### Verification Techniques (not SMT)


### Tool Papers



# **Game Theory**

# A Game for Linear-time–Branching-time Spectroscopy

Benjamin Bisping() and Uwe Nestmann

Technische Universität Berlin, Berlin, Germany {benjamin.bisping,uwe.nestmann}@tu-berlin.de

Abstract We introduce a generalization of the bisimulation game that can be employed to find all relevant distinguishing Hennessy–Milner logic formulas for two compared finite-state processes. By measuring the use of expressive powers, we adapt the formula generation to just yield formulas belonging to the coarsest distinguishing behavioral preorders/equivalences from the linear-time–branching-time spectrum. The induced algorithm can determine the best fit of (in)equivalences for a pair of processes.

Keywords: Process equivalence spectrum · Distinguishing formulas · Bisimulation games.

### 1 Introduction

Have you ever looked at two system models and wondered what would be the finest notions of behavioral equivalence to equate them—or, conversely: the coarsest ones to distinguish them? We run into this situation often when analyzing models and, especially, when devising examples for teaching. We then find ourselves fiddling around with whiteboards and various tools, each implementing different equivalence checkers. Would it not be nice to *decide all equivalences at once*?

*Example 1.* Consider the following CCS process P<sup>1</sup> = a.(b + c) + a.d. It describes a machine that can be activated (a) and then either is in a state where one can choose from b and c or where it can only be deactivated again (d). P<sup>1</sup> shares a lot of properties with P<sup>2</sup> = a.(b + d) + a.(c + d). For example, they have the same traces (and the same completed traces). Thus, they are (completed) trace equivalent.

But they also have differences. For instance, P<sup>1</sup> has a run where it executes a and then cannot do d, while P<sup>2</sup> does not have such a run. Hence, they are *not failure equivalent*. Moreover, P<sup>1</sup> may perform a and then choose from b and c, and P<sup>2</sup> cannot. This renders the two processes also *not simulation equivalent*. Failure equivalence and simulation equivalence are incomparable—that is, neither one follows from the other one. *Both* are coarsest ways of telling the processes apart. Other inequivalences, like bisimulation inequivalence, are implied by both.

In the following, we present a uniform game-based way of finding the most fitting notions of (in)equivalence for process pairs like in Ex. 1.

<sup>©</sup> The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 3–19, 2021. https://doi.org/10.1007/978-3-030-72016-2\_1

Our approach is based on the fact that notions of process equivalence can be characterized by two-player games. The defender's winning region in the game corresponds to pairs of equivalent states, and the attacker's winning strategies correspond to distinguishing formulas of Hennessy–Milner logic (HML).

Each notion of equivalence in van Glabbeek's famous linear-time–branching-time spectrum [10] can be characterized by a subset of HML with specific distinguishing power. Some of the notions are incomparable. So, often a process pair that is equivalent with respect to one equivalence, is distinguished by a set of slightly coarser or incomparable equivalences, without any one of them alone being *the* coarsest way to distinguish the pair. As with the spectrum of light where a mix of wave lengths shows to us as a color, there is a "mix" of distinguishing capabilities involved in establishing whether a specific equivalence is finest. Our algorithm is meant to analyze what is in the mix.

Contributions. This paper makes the following contributions:


We frame the contributions by a roundtrip through the basics of HML, games and the spectrum (Section 2), a discussion of related work (Section 5), and concluding remarks on future lines of research (Section 6).

### 2 Preliminaries: HML, Games, and the Spectrum

We use the concepts of transition systems, games, observations, and notions of equivalence, largely due to the wake of Hennessy and Milner's seminal paper [14].

#### 2.1 Transition Systems and Hennessy–Milner Logic

*Labeled transition systems* capture a discrete world view, where there is a current state and a branching structure of possible state changes to future states.

Definition 1 (Labeled transition system). *A* labeled transition system *is a tuple* <sup>S</sup> = (P,Σ, <sup>→</sup>) *where* <sup>P</sup> *is the set of* states*,* <sup>Σ</sup> *is the set of* actions*, and* →⊆P× <sup>Σ</sup> × P *is the* transition relation*.*

*Hennessy–Milner logic* [14] describes finite *observations* (or "tests") that one can perform on such a system.

Definition 2 (Hennessy–Milner logic). *Given an alphabet* Σ*, the syntax of* Hennessy–Milner logic *formulas,* HML[Σ]*, is inductively defined as follows:*

Observations *If* <sup>ϕ</sup> <sup>∈</sup> HML[Σ] *and* <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>*, then* a<sup>ϕ</sup> <sup>∈</sup> HML[Σ]*.* Conjunctions *If* <sup>ϕ</sup><sup>i</sup> <sup>∈</sup> HML[Σ] *for all* <sup>i</sup> *from an index set* <sup>I</sup>*, then* - <sup>i</sup>∈Iϕ<sup>i</sup> <sup>∈</sup> HML[Σ]*.*

Negations *If* <sup>ϕ</sup> <sup>∈</sup> HML[Σ]*, then* <sup>¬</sup><sup>ϕ</sup> <sup>∈</sup> HML[Σ]*.*

We often just write -{ϕ0, ϕ1, ...} for - <sup>i</sup>∈<sup>I</sup>ϕi. <sup>T</sup> denotes -∅, the nil-element of the syntax tree, and a is a short-hand for aT. Let us also implicitly assume that formulas are flattened in the sense that conjunctions do not contain other conjunctions as immediate subformulas. We will sometimes talk about the syntax tree height of a formula and consider the height of T to equal 0.

Intuitively, a<sup>ϕ</sup> means that one can observe a system transition labeled by <sup>a</sup> and then continue to make observation(s) ϕ. Conjunction and negation work as known from propositional logic. We will provide a common game semantics for HML in the following subsection.

#### 2.2 Games Semantics of HML

Let us fix some notions for *Gale-Stewart-style reachability games* where the defender wins all infinite plays.

Definition 3 (Games). *<sup>A</sup>* simple reachability game <sup>G</sup>[g0]=(G, Gd,-, g0) *consists of*

	- *a set of* defender positions <sup>G</sup><sup>d</sup> <sup>⊆</sup> <sup>G</sup>
	- *and* attacker positions <sup>G</sup><sup>a</sup> := <sup>G</sup> \ <sup>G</sup>d*,*

Definition 4 (Plays and wins). *We call the paths* <sup>g</sup>0g1... <sup>∈</sup> <sup>G</sup><sup>∞</sup> *with* <sup>g</sup><sup>i</sup> gi+1 plays *of* <sup>G</sup>[g0]*. The defender* wins *infinite plays. If a finite play* <sup>g</sup><sup>0</sup> ...g<sup>n</sup> *is stuck, the stuck player loses: The defender wins if* <sup>g</sup><sup>n</sup> <sup>∈</sup> <sup>G</sup>a*, and the attacker wins if* <sup>g</sup><sup>n</sup> <sup>∈</sup> <sup>G</sup>d*.*

Definition 5 (Strategies and winning strategies). *A (positional, nondeterministic)* strategy *is a subset of the moves,* <sup>F</sup> <sup>⊆</sup> -*. If (fairly) picking elements of strategy* F *ensures a player to win,* F *is called a* winning strategy *for this player. The player with a winning strategy for* <sup>G</sup>[g0] *is said to* win <sup>G</sup>[g0]*.*

Definition 6 (Winning regions). *The set* <sup>W</sup><sup>a</sup> <sup>⊆</sup> <sup>G</sup> *of all positions* <sup>g</sup> *where the attacker wins* <sup>G</sup>[g] *is called the* attacker winning region *(defender winning region* W<sup>d</sup> *analogous).*

All Gale-Stewart-style reachability games are *determined*, that is, <sup>W</sup><sup>a</sup> <sup>∪</sup> <sup>W</sup><sup>d</sup> <sup>=</sup> <sup>G</sup>. The winning regions of finite simple reachability games can be computed in linear time of the number of game moves (cf. [13]). This is why the spectroscopy game of this paper can easily be used in algorithms. It derives from the following game.

Definition 7 (HML game). *For a transition system* <sup>S</sup> = (P,Σ, <sup>→</sup>)*, the* HML game G<sup>S</sup> HML[g0]=(G, Gd,-, g0) *is played on* <sup>G</sup> <sup>=</sup> P × HML[Σ]*, where the defender controls observations and negated conjunctions, that is* (p,aϕ) <sup>∈</sup> <sup>G</sup><sup>d</sup> *and* (p, <sup>¬</sup> - <sup>i</sup>∈Iϕi) <sup>∈</sup> <sup>G</sup><sup>d</sup> *(for all* ϕ, p, I*), and the attacker controls the rest. There are five kinds of moves:*

$$\begin{array}{llll} - & (p, \langle a \rangle \varphi) & \longmapsto & (p', \varphi) & \mbox{if } p \stackrel{a}{\to} p', \\ - & (p, \neg \langle a \rangle \varphi) & \longmapsto & (p', \neg \varphi) & \mbox{if } p \stackrel{a}{\to} p', \\ - & (p, \bigwedge\_{i \in I} \varphi\_{i}) & \longmapsto & (p, \varphi\_{i}) & \mbox{with } i \in I, \\ - & (p, \neg \bigwedge\_{i \in I} \varphi\_{i}) & \longmapsto & (p, \neg \varphi\_{i}) & \mbox{with } i \in I, \mbox{ and} \\ - & (p, \neg \neg \varphi) & \longmapsto & (p, \varphi). \end{array}$$

Like in other logical games in the Ehrenfeucht–Fraïssé tradition, the attacker plays the conjunctions and universal quantifiers, whereas the defender plays the disjunctions and existential quantifiers. For instance, (p,aϕ) is declared as defender position, since a<sup>ϕ</sup> is meant to become true precisely if *there exists* <sup>a</sup> state p reachable p <sup>a</sup> <sup>→</sup> <sup>p</sup> where <sup>ϕ</sup> is true.

As every move strictly reduces the height of the formula, the game must be finite-depth (and cycle-free), and, for image-finite systems and formulas, also finite. It is determined and the following semantics is total.

Definition 8 (HML semantics). *For a transition system* <sup>S</sup> = (P,Σ, <sup>→</sup>)*, the* semantics of HML *is given by defining that* <sup>ϕ</sup> *is true at* <sup>p</sup> *in* <sup>S</sup>*, written* ϕ S <sup>p</sup> *, iff the defender wins* G<sup>S</sup> HML[(p, ϕ)]*.*

*Example 2.* Continuing Ex. 1, a¬dT CCS P2 is false: No matter whether the defender plays to (<sup>b</sup> <sup>+</sup> d, ¬dT) or to (<sup>c</sup> <sup>+</sup> d, ¬dT), the attacker wins by moving to the stuck defender position (**0**, <sup>¬</sup>T). (Recall that <sup>T</sup> is the empty conjunction and that **0** is the completed process!)

#### 2.3 The Spectrum of Behavioral Equivalences

Definition 9 (Distinguishing formula). *A formula* ϕ distinguishes *state* p *from* <sup>q</sup> *iff* <sup>ϕ</sup><sup>p</sup> *is true and* <sup>ϕ</sup><sup>q</sup> *is not.*<sup>1</sup>

*Example 3.* a¬d<sup>T</sup> distinguishes <sup>P</sup><sup>1</sup> from <sup>P</sup><sup>2</sup> in Ex. <sup>1</sup> (but not the other way around). a -{bT,dT} distinguishes <sup>P</sup><sup>2</sup> from <sup>P</sup>1.

Definition 10 (Observational preorders and equivalences). *A set of observations,* <sup>O</sup>X <sup>⊆</sup> HML[Σ]*,* preorders *two states* p, q*, written* <sup>p</sup> X <sup>q</sup>*, iff no formula* <sup>ϕ</sup> ∈ OX *distinguishes* <sup>p</sup> *from* <sup>q</sup>*. If* <sup>p</sup> X <sup>q</sup> *and* <sup>q</sup> X <sup>p</sup>*, then the two are* <sup>X</sup>*-equivalent, written* <sup>p</sup> <sup>≡</sup>X <sup>q</sup>*.*

<sup>1</sup> In the following, we usually leave the transition system <sup>S</sup> implicit.

Definition 11 (Linear-time–branching-time languages [12]). *The lineartime–branching-time spectrum is a lattice of observation languages (and of entailed process preorders and equivalences). Every observation language* <sup>O</sup>X *can perform trace observations, that is,* <sup>T</sup> ∈ OX *and, if* <sup>ϕ</sup> ∈ OX *, then* a<sup>ϕ</sup> ∈ OX *. At the more linear-time side of the spectrum we have:*


*At the more branching-time side, we have simulation observations. Every simulation observation language* <sup>O</sup>XS *, has full conjunctive capacity, that is, if* <sup>ϕ</sup><sup>i</sup> ∈ OXS *for all* <sup>i</sup> <sup>∈</sup> <sup>I</sup>*, then* - <sup>i</sup>∈<sup>I</sup>ϕ<sup>i</sup> ∈ OXS *.*


The observation languages of the spectrum differ in how many of the syntactic features of HML one will encounter when descending into a formula's syntax tree. We will come back to this in Subsection 3.4.

Note that we consider -{ϕ} to be an alias for <sup>ϕ</sup>. With this aliasing, all the listed observation languages are *closed* in the sense that all subformulas of an observation are themselves part of that language. They thus are *inductive* in the sense that all observations must be built from observations of the same language with lower syntax tree height.

### 3 Distinguishing Formula Games

This section introduces our main contribution: the spectroscopy game (Def. 13), and how to build all interesting distinguishing HML formulas from its winning region (Def. 14). To justify our construction and to prove that we indeed find distinguishing formulas (Thm. 1), let us first examine the formula preorder game (Def. 12), which is closer to the problem whether formulas are (non-)distinguishing.

<sup>2</sup> Like Kučera and Esparza [17], who studied the properties of "good" observation languages, we glimpse over completed trace, completed simulation and possible worlds observations here, because these observations need a special exhaustive - <sup>a</sup>∈<sup>Σ</sup>ϕ. While it could be provided for with additional operators, it would add another case in each of the upcoming definitions and would break the closure property of observation languages, without giving much in return.

#### 3.1 The Formula Preorder Game

Def. 10 entails a straightforward way of turning the problem whether a set of observations O⊆OX preorders two states p, q into a game: Have the attacker pick a supposedly distinguishing formula <sup>ϕ</sup> ∈ O, and then have the defender choose whether to play the HML game (Def. 7) for -<sup>¬</sup>ϕ<sup>p</sup> or for <sup>ϕ</sup><sup>q</sup>. This direct route will yield infinite games for infinite O—and all the languages from Def. 11 are infinite!

To bypass the infinity issue, we will introduce a variation of this game *where the attacker gradually chooses their attacking formula*. In particular, this means that the attacker now decides which observations to play. In return, the defender does not need to pick a side in the beginning and may postpone the decision where (on the right-hand side) an observation leads. Postponing decisions here means that the defender may play non-deterministically, moving to multiple states at once. The mechanics are analogous to the standard powerset construction when transforming non-deterministic finite automata into deterministic ones.

Definition 12 (Formula preorder game). *For a transition system* S = (P,Σ, <sup>→</sup>) *and a set of observations* <sup>O</sup>X *, the* formula preorder game <sup>G</sup><sup>S</sup> X [g0] = (G, Gd,-, g0) *consists of*


*and five kinds of moves*


$$\boldsymbol{if} \bigwedge\limits\_{i=1}^{J} \varphi\_{i} \in \mathcal{O},$$


The formula preorder game precisely characterizes whether an observation language is distinguishing:

Lemma 1. *For a closed observation language* <sup>O</sup>X *, the formula preorder game* GS X [(p, Q, <sup>O</sup>)a] *with* O⊆O<sup>X</sup> *is won by the defender precisely if, for every observation* <sup>ϕ</sup> ∈ O *with* <sup>ϕ</sup><sup>p</sup>*, there is a* <sup>q</sup> <sup>∈</sup> <sup>Q</sup> *such that* ϕq*.*

*Proof (Sketch).* By induction over the height of formulas in <sup>O</sup>X with arbitrary p and Q, and strengthening the induction predicate to not only consider ϕ but also partial conjunctions -<sup>O</sup> with <sup>O</sup> ⊆ O whenever <sup>ϕ</sup> <sup>=</sup> -O . To prove the right-to-left direction, exploiting the determinacy of the game is convenient.

Figure 1. Schematic spectroscopy game G of Def. 13. Boxes stand for attacker positions, circles for defender positions, arrows for moves. From the dashed boxes, the moves are analogous to the ones of the connected solid positions.

#### 3.2 The Spectroscopy Game

Let us now remove the formulas from the formula game (Def. 12). The idea is to look at the game for the whole of HML, called GB. Only attack moves in the formula game change the current set of observations, and they are completely guided by the context-free grammar of HML (Def. 2). Therefore, we can<sup>3</sup> assume <sup>O</sup> to equal HML[Σ] in every reachable position of <sup>G</sup>B. Effectively, <sup>O</sup> can be canceled out of the game, without losing any information. We call the remaining game the "spectroscopy game." Figure 1 gives a graphical representation.

Definition 13 (Spectroscopy game). *For a transition system* <sup>S</sup> = (P,Σ, <sup>→</sup>)*, the* <sup>L</sup>*-labeled* spectroscopy game <sup>G</sup><sup>S</sup> [g0]=(G, Gd, · -, g0) *with* <sup>L</sup> <sup>=</sup> {¬, <sup>∧</sup>, <sup>∗</sup>,a} *consists of*


*and four kinds of moves:*

– observation moves (p, Q)<sup>a</sup> a - (p , {q | ∃<sup>q</sup> <sup>∈</sup> Q. q <sup>a</sup> <sup>→</sup> <sup>q</sup> })<sup>a</sup> *if* <sup>p</sup> <sup>a</sup> <sup>→</sup> <sup>p</sup> – conjunct challenges (p, Q)<sup>a</sup> ∧ - (p, Q)d*,* – conjunct answers (p, Q)<sup>d</sup> ∗ - (p, {q})<sup>a</sup> *if* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*, and* – negation moves (p, {q})<sup>a</sup> ¬ -(q, {p})a*.*

We have already introduced two tricks in this definition to ease formula reconstruction in the next subsection. (1) The attack moves are labeled with the

<sup>3</sup> To be precise: Finite conjunctions may only lead to *arbitrarily large* subsets of HML[Σ]. If the attacker has a way of winning by playing a conjunction, we can as well approximate this move as playing -HML.

syntactic constructs from which they originate. This does not change expressive power. (2) Negation moves are restricted to situations where <sup>Q</sup> <sup>=</sup> {q}. After all, winning attacker strategies will pay attention to only playing a negation after minimizing the odds of being put on a bad position, anyways.

Note that, like in the formula game with arbitrary-depth formulas, the attacker could force infinite plays by cycling through conjunction moves (and also negation moves). However, they will not do this, as infinite plays are won by the defender.

Lemma 2. *The spectroscopy game* <sup>G</sup>[(p, {q})a] *is won by the defender precisely if* p *and* q *are bisimilar.*

This fact is a corollary of the well-known Hennessy–Milner theorem (HML characterizes bisimilarity), given that G is constructed as a simplification of GB.

Comparing G to the standard bisimulation game from the literature (with symmetry moves, see e.g. [3]), we can easily transfer attacker strategies from there. In the standard game, the attacker will play (p, q) - (a, p , q) with p <sup>a</sup> <sup>→</sup> <sup>p</sup> and the defender has to answer by (a, p , q)-(p , q ) with q <sup>a</sup> <sup>→</sup> <sup>q</sup> . In the spectroscopy game, the attacker can enforce analogous moves by playing (p, {q})<sup>a</sup> a -(p , Q )a ∧ -

(p , Q )d, which will make the defender pick (p , Q )d ∗ - (p , {q })a.

The opposite direction of transfer is not so easy, as the attacker has more ways of winning in G. But this asymmetry is precisely why we have to use the spectroscopy game instead of the standard bisimulation game if we want to learn about, for example, interesting failure-trace attacks.

Due to the subset construction over P, the game size clearly is exponential in the size of the state space. Going exponential is necessary, as we want to also characterize weaker preorders like the trace preorder, where exponential P-subset or Σ∗-word constructions cannot be circumvented. However, for moderate realworld systems, such constructions will not necessarily show their full exponential blow-up (cf. [6]).

For concrete implementations, the subset construction also means that the costs of storing game nodes and of comparing two nodes is linear in the state space size. Complexity-wise this factor is dominated by the overall exponentialities.

### 3.3 Building Distinguishing Formulas from Attacker Strategies

Definition 14 (Strategy formulas). *Given an attacker strategy* <sup>F</sup> <sup>⊆</sup> (G<sup>a</sup> <sup>×</sup> <sup>L</sup> <sup>×</sup> <sup>G</sup>) *for the spectroscopy game* <sup>G</sup>*, the set of* strategy formulas*,* Strat<sup>F</sup> (ga)*, is inductively defined by:*


*Example 4.* The attacks (P1, {P2})<sup>a</sup> a -(<sup>b</sup> <sup>+</sup> c, {<sup>b</sup> <sup>+</sup> d, c <sup>+</sup> <sup>d</sup>})<sup>a</sup> ∧ - <sup>∗</sup> - <sup>¬</sup> d -(**0**, <sup>∅</sup>)<sup>a</sup> ∧ - give rise to the formula a -{¬dT}, which can be written as a¬d.

Definition 15 (Winning strategy graph). *Given the attacker winning region* <sup>W</sup><sup>a</sup> *and a starting position* <sup>g</sup><sup>0</sup> <sup>∈</sup> <sup>W</sup>a*, the* attacker winning strategy graph <sup>F</sup><sup>a</sup> *is the subset of the* -*-graph that can be visited from* g<sup>0</sup> *when following all* -*-edges unless they lead out of* Wa*.*

This graph can be cyclic. However, if the attacker plays inside their winning region according to Fa, they will always have paths to their final winning positions. So even though the attacker could loop (and thus lose), they can always end the game and *win* in the sense of Def. 5.

Theorem 1. *If* <sup>W</sup><sup>a</sup> *is the attacker winning region of the spectroscopy game* <sup>G</sup>*, every* <sup>ϕ</sup> <sup>∈</sup> Strat<sup>F</sup><sup>a</sup> ((p, {q})a) *distinguishes* <sup>p</sup> *from* <sup>q</sup>*.*

*Proof.* Due to Lem. 1, it suffices to show that <sup>ϕ</sup> <sup>∈</sup> Strat<sup>F</sup><sup>a</sup> ((p, Q)a) implies that the attacker wins <sup>G</sup>B[(p, Q, {ϕ})]. We proceed by induction on the structure of Strat<sup>F</sup><sup>a</sup> with arbitrary p, Q.


Note that the theorem is only one-way, as every distinguishing formula can neutrally be extended by saying that some additional clause that is true for *both* processes does hold. Def. 14 will not find such bloated formulas.

Due to cycles in the game graph, Strat<sup>F</sup><sup>a</sup> will usually yield infinitely many formulas. But we can become finite by injecting some way of discarding long formulas that unfold negation cycles or recursions of the underlying transition system. The next section will discuss how to do this without discarding the formulas that are interesting from the point of view of the spectrum.

#### 3.4 Retrieving Cheapest Distinguishing Formulas

In our quest for the coarsest behavioral preorders (or equivalences) distinguishing two states, we actually are only interested in the ones that are part of the *smallest* *observation languages* from the spectrum (Def. 11). We can think of the amount of HML-expressiveness used by a formula as its *price*.

Let us look at the price structure of the spectrum from Def. 11. Table 1 gives an overview of how many syntactic HML-features the observation languages may use at most. (If formulas use fewer, they still are considered part of that observation language.) So, we are talking *budgets*, in the price analogy.


Negations: How many negations may be visited when descending?

Negations height: How high can the syntax trees under each negation be?

We say that a formula ϕ<sup>1</sup> *dominates* ϕ<sup>2</sup> if ϕ<sup>1</sup> has lower or equal values than ϕ<sup>2</sup> in each dimension of the metrics with at least one entry strictly lower. Let us note the following facts:


Table 1. Dimensions of observation expressiveness.

<sup>4</sup> There is a special case for failure-traces where 1 positive flat branch may be counted as deep, if there are no other deep branches. Hence the \* in Table 1.

```
1 def game_spectroscopy(S, p0, q0):
2 GS
       = (G, Ga,-
                 ) := construct_spectroscopy_game(S)
3 Wa := compute_winning_region(GS
                               )
4 if (p0, {q0})a ∈ Wa :
5 Fa := winning_graph(GS
                         , Wa,(p0, {q0})a)
6 strats[] := ∅
7 todo := [(p0, {q0})a]
8 while todo = []:
9 g := todo.dequeue()
10 sg := strats[g]
11 if sg = undefined :
12 strats[sg] := ∅
13 gg := {g | (g, ·, g
                        ) ∈ Fa ∧ strats(g
                                     ) = undefined}
14 if gg = ∅ :
15 sg = nonDominatedOrIF(Strat
                                    Fa,strats(g))
16 if sg = sg :
17 strats(g) := sg
18 todo.enqueueEachEnd({g∗ | (g∗, ·, g) ∈ Fa ∧ g∗ ∈/ todo})
19 else:
20 todo.enqueueEachFront(gg
                                 )
21 return strats((p0, {q0})a)
22 else:
23 R := {(p, q) | (p, {q})a ∈ Ga \ Wa}
24 return R
```
Algorithm 1: Spectroscopy procedure.


These observations justify our algorithm to prune all formulas from the set Strat<sup>F</sup><sup>a</sup> (g) that are dominated with respect to the metrics by any other formula in this set, unless they are *impossible trace futures* of the form ¬a1a2.... We moreover add formula height in terms of observations as a dimension in the metric, which leads to loop unfoldings being dominated by the shorter paths.

Algorithm 1 shows all the elements in concert. It constructs the spectroscopy game G<sup>S</sup> (Def. 13) and computes its attacker winning strategy graph <sup>F</sup><sup>a</sup> (Def. 15). If the attacker cannot win, the algorithm returns a bisimulation relation. Otherwise, it constructs the distinguishing formulas: It keeps a map strats of strategy formulas that have been found so far and a list of game positions todo that have

Figure 2. Screenshot of a linear-time–branching-time spectroscopy of the processes from Ex. 1.

to be updated. In every round, we take a game position g from todo. If some of its successors have not been visited yet, we add them to the top of the work list. Otherwise we call Strat <sup>F</sup>a,strats(g) to compute distinguishing formulas using the follow-up formulas *found so far* strats. This function mostly corresponds to Def. 14 with the twist, that partial follow-ups are used instead of recursion, and that the construction for conjunctions is split onto attacker *and* defender positions. Of the found formulas, we keep only the non-dominated ones and impossible future traces. If the result changes strats(g), we enqueue each game predecessor to propagate the update there.

The algorithm structure is mostly usual fixed point machinery. It terminates because, for each state in a finite transition system, there must be a bound on the distinguishing mechanisms necessary with respect to our metrics, and Strat will only generate finitely many formulas under this bound. Keeping the impossible future formulas unbounded is alright, because they have to be constructed from trace formulas, which are subject to the bound.

### 4 A Webtool for Equivalence Spectroscopy

We have implemented the game and the generation of minimal distinguishing formulas in the "Linear-time–Branching-time Spectroscope", a Scala.js program that can be run in the browser on https://concurrency-theory.org/ltbt-spectroscope/.

The tool (screenshot in Fig. 2) consists of a text editor to input basic CCS-style processes and a view of the transition system graph. When queried to compare two processes, the tool yields the cheapest distinguishing HML-formulas it can find for both directions. Moreover, it displays the attacker-winning part of the spectroscopy game overlayed over the transition system. The latter can also enlighten matters, at least for small and comparably deterministic transition systems. From the found formulas, the tool can also infer the finest fitting preorders for pairs of processes (Fig. 3).

To "benchmark" the quality of the distinguishing formulas, we have run the algorithm on all the finitary counterexample processes from the report version of "The Linear-time–Branching-time Spectrum" [12]. Table 2 reports the output of our tool, on how to distinguish certain processes. The results match the (in)equivalences given in [12]. In some cases, the tool finds slightly better ways of distinction using impossible futures equivalence, which was not known at the time of the original paper. All the computed formulas are quite elegant / minimal.

For each of the examples (from papers) we have considered, the browser's capacities sufficed to run the algorithm in 30 to 250 milliseconds. This does not mean that one should expect the algorithm to work for systems with thousands of states. There, the exponentialities of game and formula construction would hit. However, such big instances would usually stem from preexisting models where one would very much hope for the designers to already know under which semantics to interpret their model. The practical applications of our browser tool are more on the research side: When devising compiler optimizations, encodings, or distributed algorithms, it can be very handy to fully grasp the equivalence structure of isolated instances. The Linear-time–Branching-time Spectroscope supports this process.


Table 2. Formulas found by our implementation for some interesting processes from [12].

Figure 3. Tool output of finest preorders for transition systems. (Left: Ex. 1; right: a.b + a.(b + c) + a.c vs. a.b + a + a.c.

### 5 Related Work and Alternatives

The game and the algorithm presented fill a blank spot in between the following previous directions of work:

Distinguishing formulas in general. Cleaveland [5] showed how to restore (non-minimal) distinguishing formulas for bisimulation equivalence from the execution of a bisimilarity checker based on the splitting of blocks. There, it has been named as possible future work to extend the construction to other notions of the spectrum. We are not aware of any place where this has previously been done completely. But there are related islands like the encoding between CTL and failure traces by Bruda and Zhang [7]. There is also more recent work like Jasper et. al [15] extending to the generation of characteristic invariant formulas for bisimulation classes. Previous algorithms for bisimulation in-equivalence tend to generate formulas that alternate a and [b] observations while pushing negation to the innermost level. Such formulas can not as easily be linked to the spectrum as ours.

Game-characterizations of the spectrum. After Shukla et al. [18] had shown how to characterize many notions of equivalence by HORNSAT games, Chen and Deng [4] presented a hierarchy of games characterizing all the equivalences of the linear-time–branching-time spectrum. The games from [4] cannot be applied as easily as ours in algorithms because they allow word moves and thus are infinite already for finite transition systems with cycles. Constructing distinguishing formulas from attacker strategies of these games would be less convenient than in our solution. Their parametric approach is comparable to fixing maximal price budgets *ex ante*. Our on-the-fly picking of minimal prices is more flexible.

Using game-characterizations for distinguishing formulas. There is recent work by Mika-Michalski et al. [16] on constructing distinguishing formulas using games in a more abstract coalgebraic setting focussed on the absence of bisimulation. The game and formula generation there, however, cannot easily be adapted for our purpose of performing a *spectroscopy* also for weaker notions.

Alternatives. One can also find the finest notion of equivalence between two states by *gradually minimizing* the transition system with ever coarser equivalences from bisimulation to trace equivalence until the states are conflated (possibly also trying branches). Within a big tool suite of highly optimized algorithms this should be quite efficient. We preferred the game approach, because it can uniformly be extended to the whole spectrum and also has the big upside of explaining the in-equivalences by distinguishing formulas.

An avenue of optimization for our approach, we have already tried, is to run the formula search on a *directed acyclic subgraph* of the winning strategy graph. For our purpose of finding most fitting equivalences, DAG-ification may preclude the algorithm from finding the right formulas. On the other hand, if one is mainly interested in a short distinguishing formula for instance, one can speed up the process with DAG-ification by the order of remaining game rounds.

### 6 Conclusion

In this paper, we have established a convenient way of finding distinguishing formulas that use a minimal amount of expressiveness.

System analysis tools can employ the algorithm to tell their users in more detail *how equivalent* two process models are. While the generic approach is costly, instantiations to more specific, symbolic, compositional, on-the-fly or depth-bounded settings may enable wider applications. There are also some algorithmic tricks (like building the concrete formulas only after having found the price bounds and heuristics in handling the game graph) we have not explored in this paper.

So far, we have only looked at *strong* notions of equivalence [10]. We plan to verify the game in Isabelle/HOL and to extend our algorithm, so it also deals with *weak* notions of equivalence [11]. These equivalences abstract over τ -actions representing "internal activity" and correspond to observation languages with a special temporal -observation (cf. [9]). This would generalize work on weak game characterizations such as de Frutos-Escrig et al.'s [8] and our own [2,3]. The vision is to arrive at *one* certifying algorithm that can yield finest equivalences and cheapest distinguishing formulas as witnesses for the whole discrete spectrum.

On a different note, our group is also working on an educational computer game about process equivalences.<sup>5</sup> The (theoretical) game of this paper can likely

<sup>5</sup> A prototype featuring equivalences between strong bisimulation and coupled simulation (result of Dominik Peacock's bachelor thesis) can be played on https: //www.concurrency-theory.org/rvg-game/.

be adapted to go in the other direction: from formulas to distinguished transition systems. It may thereby synthesize levels for the (computer) game. So, in the end, all this might actually contribute to actual people having actual fun.

Acknowledgments. We are thankful to members of our research group (especially Kim Völlinger), participants of our course Modelle Dynamischer Systeme, and the anonymous reviewers for lots of helpful comments.

Data availability. The source code git repository of our implementation can be accessed via https://concurrency-theory.org/ltbt-spectroscope/code/. Code to reproduce the results presented in this paper is available on Zenodo [1].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **On Satisficing in Quantitative Games**

Suguman Bansal<sup>1</sup> -, Krishnendu Chatterjee2, and Moshe Y. Vardi<sup>3</sup>

<sup>1</sup> University of Pennsylvania, Philadelphia, USA suguman@seas.upenn.edu

<sup>2</sup> IST Austria, Klosterneuburg, Austria, krishnendu.chatterjee@ist.ac.at <sup>3</sup> Rice University, Houston, USA vardi@cs.rice.edu

**Abstract.** Several problems in planning and reactive synthesis can be reduced to the analysis of two-player quantitative graph games. *Optimization* is one form of analysis. We argue that in many cases it may be better to replace the optimization problem with the *satisficing problem*, where instead of searching for optimal solutions, the goal is to search for solutions that adhere to a given threshold bound.

This work defines and investigates the satisficing problem on a two-player graph game with the discounted-sum cost model. We show that while the satisficing problem can be solved using numerical methods just like the optimization problem, this approach does not render compelling benefits over optimization. When the discount factor is, however, an integer, we present another approach to satisficing, which is purely based on automata methods. We show that this approach is algorithmically more performant – both theoretically and empirically – and demonstrates the broader applicability of satisficing over optimization.

### **1 Introduction**

Quantitative properties of systems are increasingly being explored in automated reasoning [4,14,16,20,21,26]. In decision-making domains such as planning and reactive synthesis, quantitative properties have been deployed to describe soft constraints such as quality measures [11], cost and resources [18,22], rewards [31], and the like. Since these constraints are soft, it suffices to generate solutions that are good enough w.r.t. the quantitative property.

Existing approaches on the analysis of quantitative properties have, however, primarily focused on optimization of these constraints, i.e., to generate optimal solutions. We argue that there may be disadvantages to searching for optimal solutions, where good enough ones may suffice. First, optimization may be more expensive than searching for good-enough solutions. Second, optimization restricts the search-space of possible solutions, and thus could limit the broader applicability of the resulting solutions. For instance, to generate solutions that operate within battery life, it is too restrictive to search for solutions with minimal battery consumption. Besides, solutions with minimal battery consumption may be limited in their applicability, since they may not satisfy other goals, such as desirable temporal tasks.

To this end, this work focuses on directly searching for good-enough solutions. We propose an alternate form of analysis of quantitative properties in

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 20–37, 2021.

https://doi.org/10.1007/978-3-030-72016-2 2

which the objective is to search for a solution that adheres to a given threshold bound, possibly derived from a physical constraint such as battery life. We call this the satisficing problem, a term popularized by H.A.Simon in economics to mean satisfy and suffice, implying a search for good-enough solutions [1]. Through theoretical and empirical investigation, we make the case that satisficing is algorithmically more performant than optimization and, further, that satisficing solutions may have broader applicability than optimal solutions.

This work formulates and investigates the satisficing problem on two-player, finite-state games with the discounted-sum (DS) cost model, which is a standard cost-model in decision-making domains [24,25,28]. In these games, players take turns to pass a token along the transition relation between the states. As the token is pushed around, the play accumulates costs along the transitions using the DS cost model. The players are assumed to have opposing objectives: one player maximizes the cost, while the other player minimizes it. We define the satisficing problem as follows: Given a threshold value <sup>v</sup> <sup>∈</sup> <sup>Q</sup>, does there exist a strategy for the minimizing (or maximizing) player that ensures the cost of all resulting plays is strictly or non-strictly lower (or greater) than the threshold v?

Clearly, the satisficing problem is decidable since the optimization problem on these quantitative games is known to be solvable in pseudo-polynomial time [17,23,32]. To design an algorithm for satisficing, we first adapt the celebrated value-iteration (VI) based algorithm for optimization [32] (§ 3). We show, however, that this algorithm, called VISatisfice, displays the same complexity as optimization and hence renders no complexity-theoretic advantage. To obtain worst-case complexity, we perform a thorough worst-case analysis of VI for optimization. It is interesting that a thorough analysis of VI for optimization had hitherto been absent from the literature, despite the popularity of VI. To address this gap, we first prove that VI should be executed for <sup>Θ</sup>(|<sup>V</sup> <sup>|</sup>) iterations to compute the optimal value, where V and E refer to the sets of states and transitions in the quantitative game. Next, to compute the overall complexity, we take into account the cost of arithmetic operations as well, since they appear in abundance in VI. We demonstrate an orders-of-magnitude difference between the complexity of VI under different cost-models of arithmetic. For instance, for integer discount factors, we show that VI is <sup>O</sup>(|<sup>V</sup> |·|E|) and <sup>O</sup>(|<sup>V</sup> <sup>|</sup> <sup>2</sup> · |E|) under the unit-cost and bit-cost models of arithmetic, respectively. Clearly, this shows that VI for optimization, and hence VISatisfice, does not scale to large quantitative games.

We then present a purely automata-based approach for satisficing (§ 4). While this approach applies to integer discount factors only, it solves satisficing in <sup>O</sup>(|<sup>V</sup> <sup>|</sup> <sup>+</sup> <sup>|</sup>E|) time. This shows that there is a fundamental separation in complexity between satisficing and VI-based optimization, as even the lower bound on the number of iterations in VI is higher. In this approach, the satisficing problem is reduced to solving a safety or reachability game. Our core observation is that the criteria to fulfil satisficing with respect to threshold value <sup>v</sup> <sup>∈</sup> <sup>Q</sup> can be expressed as membership in an automaton that accepts a weight sequence A iff DS(A, d) <sup>R</sup> <sup>v</sup> holds, where d > 1 is the discount factor and <sup>R</sup> ∈ {≤, <sup>≥</sup>, <, >}. In existing literature, such automata are called comparator automata (comparators, in short) when the threshold value v = 0 [6,7]. They are known to have a compact safety or co-safety automaton representation [9,19], which could be used to reduce the satisficing problem with zero threshold value. To solve satisficing for arbitrary threshold values <sup>v</sup> <sup>∈</sup> <sup>Q</sup>, we extend existing results on comparators to permit arbitrary but fixed threshold values <sup>v</sup> <sup>∈</sup> <sup>Q</sup>. An empirical comparison between the performance of VISatisfice, VI for optimization, and automata-based solution for satisficing shows that the latter outperforms the others in efficiency, scalability, and robustness.

In addition to improved algorithmic performance, we demonstrate that satisficing solutions have broader applicability than optimal ones (§ 5). We examine this with respect to their ability to extend to temporal goals. That is, the problem is to find optimal/satisficing solutions that also satisfy a given temporal goal. Prior results have shown this to not be possible with optimal solutions [13]. In contrast, we show satisficing extends to temporal goals when the discount factor is an integer. This occurs because both satisficing and satisfaction of temporal goals are solved via automata-based techniques, which can be easily integrated.

In summary, this work contributes to showing that satisficing has algorithmic and applicability advantages over optimization in (deterministic) quantitative games. In particular, we have shown that the automata-based approach for satisficing have advantages over approaches in numerical methods like valueiteration. This gives yet another evidence in favor of automata-based quantitative reasoning and opens up several compelling directions for future work.

### **2 Preliminaries**

### **2.1 Two-player graph games**

Reachability and safety games. Both reachability and safety games are defined over the structure <sup>G</sup> = (<sup>V</sup> <sup>=</sup> <sup>V</sup><sup>0</sup> <sup>V</sup>1, vinit,E, <sup>F</sup>) [30]. It consists of a directed graph (V,E), and a partition (V0, V1) of its states V . State vinit is the initial state of the game. The set of successors of state v is designated by vE. For convenience, we assume that every state has at least one outgoing edge, i.e, vE <sup>=</sup> <sup>∅</sup> for all <sup>v</sup> <sup>∈</sup> <sup>V</sup> . F ⊆ <sup>V</sup> is a non-empty set of states. <sup>F</sup> is referred to as accepting and rejecting states in reachability and safety games, respectively.

A play of a game involves two players, denoted by P<sup>0</sup> and P1, to create an infinite path by moving a token along the transitions as follows: At the beginning, the token is at the initial state. If the current position v belongs to Vi, then P<sup>i</sup> chooses the successor state from vE. Formally, a play ρ = v0v1v<sup>2</sup> ... is an infinite sequence of states such that the first state v<sup>0</sup> = vinit, and each pair of successive states is a transition, i.e., (vk, vk+1) <sup>∈</sup> <sup>E</sup> for all <sup>k</sup> <sup>≥</sup> 0. A play is winning for player P<sup>1</sup> in a reachability game if it visits an accepting state, and winning for player P<sup>0</sup> otherwise. The opposite holds in safety games, i.e., a play is winning for player P<sup>1</sup> if it does not visit any rejecting state, and winning for P<sup>0</sup> otherwise.

A strategy for a player is a recipe that guides the player on which state to go next to based on the history of the play. A strategy is winning for a player P<sup>i</sup> if for all strategies of the opponent player <sup>P</sup>1−i, the resulting plays are winning for Pi. To solve a graph game means to determine whether there exists a winning strategy for player <sup>P</sup>1. Reachability and safety games are solved in <sup>O</sup>(|<sup>V</sup> <sup>|</sup>+|E|).

Quantitative graph games. A quantitative graph game (or quantitative game, in short) is defined over a structure <sup>G</sup> = (<sup>V</sup> <sup>=</sup> <sup>V</sup>0V1, vinit,E, γ). <sup>V</sup> , <sup>V</sup>0, <sup>V</sup>1, <sup>v</sup>init, <sup>E</sup>, plays and strategies are defined as earlier. Each transition of the game is associated with a cost determined by the cost function <sup>γ</sup> : <sup>E</sup> <sup>→</sup> <sup>Z</sup>. The cost sequence of a play ρ is the sequence of costs w0w1w<sup>2</sup> ... such that w<sup>k</sup> = γ((vk, vk+1)) for all <sup>i</sup> <sup>≥</sup> 0. Given a discount factor d > 1, the cost of play <sup>ρ</sup>, denoted wt(ρ), is the discounted sum of its cost sequence, i.e., wt(ρ) = DS(ρ, d) = w<sup>0</sup> + <sup>w</sup><sup>1</sup> <sup>d</sup> <sup>+</sup> <sup>w</sup><sup>2</sup> <sup>d</sup><sup>2</sup> <sup>+</sup>... .

#### **2.2 Automata and formal languages**

B¨uchi automata. <sup>A</sup> B¨uchi automaton is a tuple <sup>A</sup> = (S, <sup>Σ</sup>, <sup>δ</sup>, <sup>s</sup>I, <sup>F</sup>), where <sup>S</sup> is a finite set of states, <sup>Σ</sup> is a finite input alphabet, <sup>δ</sup> <sup>⊆</sup> (<sup>S</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>S</sup>) is the transition relation, state <sup>s</sup><sup>I</sup> <sup>∈</sup> <sup>S</sup> is the initial state, and F ⊆ <sup>S</sup> is the set of accepting states [30]. A B¨uchi automaton is deterministic if for all states s and inputs <sup>a</sup>, |{s <sup>|</sup>(s, a, s ) <sup>∈</sup> <sup>δ</sup> for some <sup>s</sup> }| ≤ 1. For a word <sup>w</sup> <sup>=</sup> <sup>w</sup>0w<sup>1</sup> ··· ∈ <sup>Σ</sup><sup>ω</sup>, a run <sup>ρ</sup> of <sup>w</sup> is a sequence of states <sup>s</sup>0s<sup>1</sup> ... s.t. <sup>s</sup><sup>0</sup> <sup>=</sup> <sup>s</sup>I, and <sup>τ</sup><sup>i</sup> = (si, wi, si+1) <sup>∈</sup> <sup>δ</sup> for all i. Let inf (ρ) denote the set of states that occur infinitely often in run ρ. A run <sup>ρ</sup> is an accepting run if inf (ρ)∩F <sup>=</sup> <sup>∅</sup>. A word <sup>w</sup> is an accepting word if it has an accepting run. The language of B¨uchi automaton A is the set of all words accepted by <sup>A</sup>. Languages accepted by B¨uchi automata are called <sup>ω</sup>-regular.

Safety and co-safety languages. Let L ⊆ <sup>Σ</sup><sup>ω</sup> be a language over alphabet <sup>Σ</sup>. A finite word <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> is a bad prefix for <sup>L</sup> if for all infinite words <sup>y</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>, <sup>x</sup> · y /∈ L. A language <sup>L</sup> is a safety language if every word w /∈ L has a bad prefix for L [3]. A co-safety language is the complement of a safety language [19]. Safety and co-safety languages that are ω-regular are represented by specialized B¨uchi automata called safety and co-safety automata, respectively.

Comparison language and comparator automata. Given integer bound μ > 0, discount factor d > 1, and relation <sup>R</sup> ∈ {<, >, <sup>≤</sup>, <sup>≥</sup>, <sup>=</sup>, =} the comparison language with upper bound μ, relation R, discount factor d is the language of words over the alphabet <sup>Σ</sup> <sup>=</sup> {−μ, . . . , μ} that accepts <sup>A</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> iff DS(A, d) <sup>R</sup> 0 holds [5,9]. The comparator automata with upper bound μ, relation R, discount factor d is the automaton that accepts the corresponding comparison language [6]. Depending on R, these languages are safety or co-safety [9]. A comparison language is said to be ω-regular if its automaton is a B¨uchi automaton. Comparison languages are ω-regular iff the discount factor is an integer [7].

### **3 Satisficing via Optimization**

This section shows that there are no complexity-theoretic benefits to solving the satisficing problem via algorithms for the optimization problem.

§ 3.1 formally defines the satisficing problem and reviews the celebrated valueiteration (VI) algorithm for optimization by Zwick and Patterson (ZP). While ZP claim without proof that the algorithm runs in pseudo-polynomial time [32], its worst-case analysis is absent from literature. This section presents a detailed account of the said analysis, and exposes the dependence of VI's worst-case complexity on the discount factor d > 1 and the cost-model for arithmetic operations i.e. unit-cost or bit-cost model. The analysis is split into two parts: First, § 3.2 shows it is sufficient to terminate after a finite-number of iterations. Next, § 3.3 accounts for the cost of arithmetic operations per iteration to compute VI's worst-case complexity under unit- and bit-cost cost models of arithmetic Finally, § 3.4 presents and analyzes our VI-based algorithm for satisficing VISatisfice.

### **3.1 Satisficing and Optimization**

**Definition 1 (Satisficing problem).** Given a quantitative graph game G and a threshold value <sup>v</sup> <sup>∈</sup> <sup>Q</sup>, the satisficing problem is to determine whether the minimizing (or maximizing) player has a strategy that ensures the cost of all resulting plays is strictly or non-strictly lower (or greater) than the threshold v.

The satisficing problem can clealy be solved by solving the optimization problem. The optimal cost of a quantitative game is that value such that the maximizing and minimizing players can guarantee that the cost of plays is at least and at most the optimal value, respectively.

**Definition 2 (Optimization problem).** Given a quantitative graph game G, the optimization problem is to compute the optimal cost from all possible plays from the game, under the assumption that the players have opposing objectives to maximize and minimize the cost of plays, respectively.

Seminal work by Zwick and Patterson showed the optimization problem is solved by the value-iteration algorithm presented here [32]. Essentially, the algorithm plays a min-max game between the two players. Let wtk(v) denote the optimal cost of a <sup>k</sup>-length game that begins in state <sup>v</sup> <sup>∈</sup> <sup>V</sup> . Then wtk(v) can be computed using the following equations: The optimal cost of a 1-length game beginning in state <sup>v</sup> <sup>∈</sup> <sup>V</sup> is max{γ(v, w)|(v, w) <sup>∈</sup> <sup>E</sup>} if <sup>v</sup> <sup>∈</sup> <sup>V</sup><sup>0</sup> and min{γ(v, w)|(v, w) <sup>∈</sup> <sup>E</sup>} if <sup>v</sup> <sup>∈</sup> <sup>V</sup>1. Given the optimal-cost of a <sup>k</sup>-length game, the optimal cost of a (k + 1)-length game is computed as follows:

$$wt\_{k+1}(v) = \begin{cases} \max\{\gamma(v,w) + \frac{1}{d} \cdot wt\_k(w) | (v,w) \in E\} \text{ if } v \in V\_0\\ \min\{\gamma(v,w) + \frac{1}{d} \cdot wt\_k(w) | (v,w) \in E\} \text{ if } v \in V\_1 \end{cases}$$

Let <sup>W</sup> be the optimal cost. Then, <sup>W</sup> = lim<sup>k</sup>→∞ wtk(vinit). [27,32].

#### **3.2 VI: Number of iterations**

The VI algorithm described above terminates at infinitum. To compute the algorithms' worst-case complexity, we establish a linear bound on the number of iterations that is sufficient to compute the optimal cost. We also establish a matching lower bound, showing that our analysis is tight.

Upper bound on number of iterations. The upper bound computation utilizes one key result from existing literature: There exist memoryless strategies for both players such that the cost of the resulting play is the optimal cost [27]. Then, there must exists an optimal play in the form of a simple lasso in the quantitative game, where a lasso is a play represented as v0v<sup>1</sup> ...vn(s0s<sup>2</sup> ...sm)ω. We call the initial segment v0v<sup>1</sup> ...v<sup>n</sup> its head, and the cycle segment s0s<sup>1</sup> ...s<sup>m</sup> its loop. A lasso is simple if each state in {v<sup>0</sup> ...vn, s0,...sm} is distinct. We begin our proof by assigning constraints on the optimal cost using the simple lasso structure of an optimal play (Corollary 1 and Corollary 2).

Let l = a<sup>0</sup> ...an(b<sup>0</sup> ...bm)<sup>ω</sup> be the cost sequence of a lasso such that l<sup>1</sup> = a<sup>0</sup> ...a<sup>n</sup> and l<sup>2</sup> = b<sup>0</sup> ...b<sup>m</sup> are the cost sequences of the head and the loop, respectively. Then the following can be said about DS(l<sup>1</sup> · <sup>l</sup> ω <sup>2</sup> , d),

**Lemma 1.** Let <sup>l</sup> <sup>=</sup> <sup>l</sup><sup>1</sup> ·(l2)<sup>ω</sup> represent an integer cost sequence of a lasso, where l<sup>1</sup> and l<sup>2</sup> are the cost sequences of the head and loop of the lasso. Let d = <sup>p</sup> q be the discount factor. Then, DS(l, d) is a rational number with denominator at most (p|l2<sup>|</sup> <sup>−</sup> <sup>q</sup>|l2<sup>|</sup> ) · (p|l1<sup>|</sup> ).

Lemma 1 is proven by unrolling DS(l<sup>1</sup> · <sup>l</sup> ω <sup>2</sup> , d). Then, the first constraint on the optimal cost is as follows:

**Corollary 1.** Let G = (V,vinit,E, γ) be a quantitative graph game. Let d = <sup>p</sup> <sup>q</sup> be the discount factor. Then the optimal cost of the game is a rational number with denominator at most (p|<sup>V</sup> <sup>|</sup> <sup>−</sup> <sup>q</sup>|<sup>V</sup> <sup>|</sup> ) · (p|<sup>V</sup> <sup>|</sup> )

Proof. Recall, there exists a simple lasso that computes the optimal cost. Since a simple lasso is of <sup>|</sup><sup>V</sup> <sup>|</sup>-length at most, the length of its head and loop are at most <sup>|</sup><sup>V</sup> <sup>|</sup> each. So, the expression from Lemma 1 simplifies to (p|<sup>V</sup> <sup>|</sup> <sup>−</sup> <sup>q</sup>|<sup>V</sup> <sup>|</sup> )·(p|<sup>V</sup> <sup>|</sup> ).

The second constraint has to do with the minimum non-zero difference between the cost of simple lassos:

**Corollary 2.** Let G = (V,vinit,E, γ) be a quantitative graph game. Let d = <sup>p</sup> q be the discount factor. Then the minimal non-zero difference between the cost of simple lassos is a rational with denominator at most (p(|<sup>V</sup> <sup>|</sup>) <sup>−</sup> <sup>q</sup>(|<sup>V</sup> <sup>|</sup>))<sup>2</sup> ·(p(2·|<sup>V</sup> <sup>|</sup>) ).

Proof. Given two rational numbers with denominator at most a, an upper bound on the denominator of minimal non-zero difference of these two rational numbers is a<sup>2</sup>. Then, using the result from Corollary 1, we immediately obtain that the minimal non-zero difference between the cost of two lassos is a rational number with denominator at most (p(|<sup>V</sup> <sup>|</sup>) <sup>−</sup> <sup>q</sup>(|<sup>V</sup> <sup>|</sup>))<sup>2</sup> · (p(2·|<sup>V</sup> <sup>|</sup>) ).

For notational convenience, let bound<sup>W</sup> = (p|<sup>V</sup> <sup>|</sup>−q|<sup>V</sup> <sup>|</sup> )·(p|<sup>V</sup> <sup>|</sup> ) and bounddiff = (p(|<sup>V</sup> <sup>|</sup>) <sup>−</sup> <sup>q</sup>(|<sup>V</sup> <sup>|</sup>))<sup>2</sup> · (p(2·|<sup>V</sup> <sup>|</sup>) ). Wlog, <sup>|</sup><sup>V</sup> <sup>|</sup> <sup>&</sup>gt; 1. Since, <sup>1</sup> bounddiff <sup>&</sup>lt; <sup>1</sup> bound<sup>W</sup> , there is at most one rational number with denominator bound<sup>W</sup> or less in any interval of size <sup>1</sup> bounddiff . Thus, if we can identify an interval of size less than <sup>1</sup> bounddiff around the optimal cost, then due to Corollary 1, the optimal cost will be the unique rational number with denominator bound<sup>W</sup> or less in this interval.

**Fig. 1.** Sketch of game graph which requires Ω(|V |) iterations

Thus, the final question is to identify a small enough interval (of size <sup>1</sup> bounddiff or less) such that the optimal cost lies within it. To find an interval around the optimal cost, we use a finite-horizon approximation of the optimal cost:

**Lemma 2.** Let W be the optimal cost in quantitative game G. Let μ > 0 be the maximum of absolute value of cost on transitions in <sup>G</sup>. Then, for all <sup>k</sup> <sup>∈</sup> <sup>N</sup>,

$$wt\_k(v\_{\text{init}}) - \frac{1}{d^{k-1}} \cdot \frac{\mu}{d-1} \le W \le wt\_k(v\_{\text{init}}) + \frac{1}{d^{k-1}} \cdot \frac{\mu}{d-1}$$

Proof. Since <sup>W</sup> is the limit of wtk(vinit) as <sup>k</sup> → ∞, <sup>W</sup> must lie in between the minimum and maximum cost possible if the k-length game is extended to an infinite-length game. The minimum possible extension would be when the klength game is extended by iterations in which the cost incurred in each round is <sup>−</sup>μ. Therefore, the minimum possible value is wtk(vinit)<sup>−</sup> <sup>1</sup> <sup>d</sup>k−<sup>1</sup> · <sup>μ</sup> <sup>d</sup>−<sup>1</sup> . Similarly, the maximum possible value is wtk(vinit) + <sup>1</sup> <sup>d</sup>k−<sup>1</sup> · <sup>μ</sup> <sup>d</sup>−<sup>1</sup> .

Now that we have an interval around the optimal cost, we can compute the number of iterations of VI required to make it smaller than 1/bounddiff.

**Theorem 1.** Let G = (V,vinit,E, γ) be a quantitative graph game. Let μ > 0 be the maximum of absolute value of costs along transitions. The number of iterations required by the value-iteration algorithm is

1. <sup>O</sup>(|<sup>V</sup> <sup>|</sup>) when discount factor <sup>d</sup> <sup>≥</sup> <sup>2</sup>, 2. O log(μ) <sup>d</sup>−<sup>1</sup> <sup>+</sup> <sup>|</sup><sup>V</sup> <sup>|</sup> when discount factor 1 <d< 2.

Proof (Sketch). As discussed in Corollary 1-2 and Lemma 2, the optimal cost is the unique rational number with denominator <sup>1</sup> bound<sup>W</sup> or less within the interval (wtk(vinit)<sup>−</sup> <sup>1</sup> <sup>d</sup>k−<sup>1</sup> · <sup>μ</sup> <sup>d</sup>−<sup>1</sup> , wtk(vinit) + <sup>1</sup> <sup>d</sup>k−<sup>1</sup> · <sup>μ</sup> <sup>d</sup>−<sup>1</sup> ) for a large enough k > 0 such that the interval's size is less than <sup>1</sup> bounddiff . Thus, our task is to determine the value of k > 0 such that 2 · <sup>μ</sup> <sup>d</sup>−1·dk−<sup>1</sup> <sup>≤</sup> <sup>1</sup> bounddiff holds. The case <sup>d</sup> <sup>≥</sup> 2 is easy to simplify. The case 1 <d< 2 involves approximations of logarithms of small values.

Lower bound on number of iterations of VI. We establish a matching lower bound of <sup>Ω</sup>(|<sup>V</sup> <sup>|</sup>) iterations to show that our analysis is tight.

Consider the sketch of a quantitative game in Fig 1. Let all states belong to the maximizing player. Hence, the optimization problem reduces to searching for a path with optimal cost. Now let the loop on the right-hand side (RHS) be larger than the loop on the left-hand side (LHS). For carefully chosen values of w and lengths of the loops, one can show that the path for optimal cost of a k-length game is along the RHS loop when k is small, but along the LHS loop when k is large. This way, the correct maximal value can be obtained only at a large value for k. Hence the VI algorithm runs for at least enough iterations that the optimal path will be in the LHS loop. By meticulous reverse engineering of the size of both loops and the value of <sup>w</sup>, one can guarantee that <sup>k</sup> <sup>=</sup> <sup>Ω</sup>(|<sup>V</sup> <sup>|</sup>).

#### **3.3 Worst-case complexity analysis of VI for optimization**

Finally, we complete the worst-case complexity analysis of VI for optimization. We account for the the cost of arithmetic operations since they appear in abundance in VI. We demonstrate that there are orders-of-magnitude of difference in complexity under different models of arithmetic, namely unit-cost and bit-cost.

Unit-cost model. Under the unit-cost model of arithmetic, all arithmetic operations are assumed to take constant time.

**Theorem 2.** Let G = (V,vinit,E, γ) be a quantitative graph game. Let μ > 0 be the maximum of absolute value of costs along transitions. The worst-case complexity of the optimization problem under unit-cost model of arithmetic is

1. <sup>O</sup>(|<sup>V</sup> |·|E|) when discount factor <sup>d</sup> <sup>≥</sup> <sup>2</sup>, 2. O log(μ)·|E<sup>|</sup> <sup>d</sup>−<sup>1</sup> <sup>+</sup> <sup>|</sup><sup>V</sup> |·|E<sup>|</sup> when discount factor 1 <d< 2.

Proof. Each iteration takes <sup>O</sup>(E) cost since every transition is visited once. Thus, the complexity is <sup>O</sup>(|E|) multiplied by the number of iterations (Theorem 1).

Bit-cost model. Under the bit-cost model, the cost of arithmetic operations depends on the size of the numerical values. Integers are represented in their bitwise representation. Rational numbers <sup>r</sup> <sup>s</sup> are represented as a tuple of the bit-wise representation of integers r and s. For two integers of length n and m, the cost of their addition and multiplication is <sup>O</sup>(<sup>m</sup> <sup>+</sup> <sup>n</sup>) and <sup>O</sup>(<sup>m</sup> · <sup>n</sup>), respectively.

**Theorem 3.** Let G = (V,vinit,E, γ) be a quantitative graph game. Let μ > 0 be the maximum of absolute value of costs along transitions. Let d = <sup>p</sup> <sup>q</sup> <sup>&</sup>gt; <sup>1</sup> be the discount factor. The worst-case complexity of the optimization problem under the bit-cost model of arithmetic is

$$\begin{array}{ll} \text{1. } \mathcal{O}(|V|^2 \cdot |E| \cdot \log p \cdot \max\{\log \mu, \log p\}) \text{ when } d \ge 2, \\\text{2. } \mathcal{O}\left(\left(\frac{\log(\mu)}{d-1} + |V|\right)^2 \cdot |E| \cdot \log p \cdot \max\{\log \mu, \log p\}\right) \text{ when } 1 < d < 2. \end{array}$$

Proof (Sketch). Since arithmetic operations incur a cost and the length of representation of intermediate costs increases linearly in each iteration, we can show that the cost of conducting the <sup>j</sup>-th iteration is <sup>O</sup>(|E| · <sup>j</sup> · log <sup>μ</sup> · log <sup>p</sup>). Their summation will return the given expressions. Remarks on integer discount factor. Our analysis shows that when the discount factor is an integer (<sup>d</sup> <sup>≥</sup> 2), VI requires <sup>Θ</sup>(|<sup>V</sup> <sup>|</sup>) iterations. Its worst-case complexity is, therefore, <sup>O</sup>(|<sup>V</sup> |·|E|) and <sup>O</sup>(|<sup>V</sup> <sup>|</sup> <sup>2</sup> ·|E|) under the unit-cost and bit-cost models for arithmetic, respectively. From a practical point of view, the bit-cost model is more relevant since implementations of VI will use multi-precision libraries to avoid floating-point errors. While one may argue that the upper bounds in Theorem 3 could be tightened, they would not improve significantly due to the <sup>Ω</sup>(|<sup>V</sup> <sup>|</sup>) lower bound on number of iterations.

### **3.4 Satisficing via value-iteration**

We present our first algorithm for the satisficing problem. It is an adaptation of VI. However, we see that it does not fare better than VI for optimization.

VI-based algorithm for satisficing is described as follows: Perform VI for optimization. Terminate as soon as one of these occurs: (a). VI completes as many iterations from Theorem 1, or (b). The threshold value falls outside the interval defined in Lemma 2. Either way, one can tell how the threshold value relates to the optimal cost to solve satisficing. Clearly, (a) needs as many iterations as optimization; (b) does not reduce the number of iterations since it is inversely proportional to the distance between optimal cost and threshold value:

**Theorem 4.** Let G = (V,vinit,E, γ) be a quantitative graph game with optimal cost <sup>W</sup>. Let <sup>v</sup> <sup>∈</sup> <sup>Q</sup> be the threshold value. Then number of iterations taken by a VI-based algorithm for the satisficing problem is min{O(|<sup>V</sup> <sup>|</sup>), log <sup>μ</sup> <sup>|</sup>W|−<sup>v</sup> } if <sup>d</sup> <sup>≥</sup> <sup>2</sup> and min{Olog(μ) <sup>d</sup>−<sup>1</sup> <sup>+</sup> <sup>|</sup><sup>V</sup> <sup>|</sup> , log <sup>μ</sup> <sup>|</sup>W|−<sup>v</sup> } if <sup>1</sup> <d< <sup>2</sup>.

Observe that this bound is tight since the lower bounds from optimization apply here as well. The worst-case complexity can be completed using similar computations from § 3.3. Since, the number of iterations is identical to Theorem 1, the worst-case complexity will be identical to Theorem 2 and Theorem 3, showing no theoretical improvement. However, its implementations may terminate soon for threshold values far from the optimal but it will retain worst-case behavior for ones closer to the optimal. The catch is since the optimal cost is unknown apriori, this leads to a highly variable and non-robust performance.

### **4 Satisficing via Comparators**

Our second algorithm for satisficing is purely based on automata-methods. While this approach operates with integer discount factors only, it runs linearly in the size of the quantitative game. This is lower than the number of iterations required by VI, let alone the worst-case complexities of VI. This approach reduces satisficing to solving a safety or reachability game using comparator automata.

The intuition is as follows: Given threshold value <sup>v</sup> <sup>∈</sup> <sup>Q</sup> and relation <sup>R</sup>, let the satisficing problem be to ensure cost of plays relates to v by R. Then, a play ρ is winning for satisficing with v and R if its cost sequence A satisfies DS(A, d) R v, where d > 1 is the discount factor. When d is an integer and v = 0, this simply checks if A is in the safety/co-safety comparator, hence yielding the reduction.

The caveat is the above applies to v = 0 only. To overcome this, we extend the theory of comparators to permit arbitrary threshold values <sup>v</sup> <sup>∈</sup> <sup>Q</sup>. We find that results from <sup>v</sup> = 0 transcend to <sup>v</sup> <sup>∈</sup> <sup>Q</sup>, and offer compact comparator constructions (§ 4.1). These new comparators are then used to reduce satisficing to develop an efficient and scalable algorithm (§ 4.2). Finally, to procure a wellrounded view of its performance, we conduct an empirical evaluation where we see this comparator-based approach outperform the VI approaches § 4.3.

# **4.1 Foundations of comparator automata with threshold** *<sup>v</sup> <sup>∈</sup>* <sup>Q</sup>

This section extends the existing literature on comparators with threshold value v = 0 [6,5,9] to permit non-zero thresholds. The properties we investigate are of safety/co-safety and ω-regularity. We begin with formal definitions:

**Definition 3 (Comparison language with threshold** <sup>v</sup> <sup>∈</sup> <sup>Q</sup>**).** For an integer upper bound μ > 0, discount factor d > 1, equality or inequality relation <sup>R</sup> ∈ {<, >, <sup>≤</sup>, <sup>≥</sup>, <sup>=</sup>, =}, and a threshold value <sup>v</sup> <sup>∈</sup> <sup>Q</sup> the comparison language with upper bound μ, relation R, discount factor d and threshold value v is a language of infinite words over the alphabet <sup>Σ</sup> <sup>=</sup> {−μ, . . . , μ} that accepts <sup>A</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> iff DS(A, d) R v holds.

**Definition 4 (Comparator automata with threshold** <sup>v</sup> <sup>∈</sup> <sup>Q</sup>**).** For an integer upper bound μ > 0, discount factor d > 1, equality or inequality relation <sup>R</sup> ∈ {<, >, <sup>≤</sup>, <sup>≥</sup>, <sup>=</sup>, =}, and a threshold value <sup>v</sup> <sup>∈</sup> <sup>Q</sup> the comparator automata with upper bound μ, relation R, discount factor d and threshold value v is an automaton that accepts the DS comparison language with upper bound μ, relation R, discount factor d and threshold value v.

**Safety and co-safety of comparison languages.** The primary observation is that to determine if DS(A, d) R v holds, it should be sufficient to examine finite-length prefixes of A since weights later on get heavily discounted. Thus,

**Theorem 5.** Let μ > 1 be the integer upper bound. For arbitrary discount factor d > <sup>1</sup> and threshold value <sup>v</sup> <sup>∈</sup> <sup>Q</sup>


Proof. The proof is identical to that for threshold value <sup>v</sup> = 0 from [9].

**Regularity of comparison languages.** Prior work on threshold value v = 0 shows that a comparator is ω-regular iff the discount factor is an integer [7]. We show the same result for arbitrary threshold values <sup>v</sup> <sup>∈</sup> <sup>Q</sup>.

First of all, trivially, comparators with arbitrary threshold value are not ωregular for non-integer discount factors, since that already holds when v = 0.

The rest of this section proves ω-regularity with arbitrary threshold values for integer discount factors. But first, let us introduce some notations: Since <sup>v</sup> <sup>∈</sup> <sup>Q</sup>, w.l.o.g. we assume that the it has an <sup>n</sup>-length representation v = v[0]v[1] ...v[m](v[m + 1]v[m + 2] ...v[n])ω. By abuse of notation, we denote both the expression v[0]v[1] ...v[m](v[m + 1]v[m + 2] ...v[n])<sup>ω</sup> and the value DS(v[0]v[1] ...v[m](v[m + 1]v[m + 2] ...v[n])ω, d) by v.

We will construct a B¨uchi automaton for the comparison language L<sup>≤</sup> for relation <sup>≤</sup>, threshold value <sup>v</sup> <sup>∈</sup> <sup>Q</sup> and an integer discount factor. This is sufficient to prove ω-regularity for all relations since B¨uchi automata are closed.

From safety/co-safety of comparison languages, we argue it is sufficient to examine the discounted-sum of finite-length weight sequences to know if their infinite extensions will be in L≤. For instance, if the discounted-sum of a finitelength weight-sequence <sup>W</sup> is very large, <sup>W</sup> could be a bad-prefix of <sup>L</sup>≤. Similarly, if the discounted-sum of a finite-length weight-sequence W is very small then for all of its infinite-length bounded extensions <sup>Y</sup> , DS(<sup>W</sup> · Y,d) <sup>≤</sup> <sup>v</sup>. Thus, a mathematical characterization of very large and very small would formalize a criterion for membership of sequences in L<sup>≤</sup> based on their finite-prefixes.

To this end, we use the concept of a recoverable gap (or gap value), which is a measure of distance of the discounted-sum of a finite-sequence from 0 [12]. The recoverable gap of a finite weight-sequences W with discount factor d, denoted gap(W, d), is defined as follows: If W = ε (the empty sequence), gap(ε, d) = 0, and gap(W, d) = <sup>d</sup>|W|−<sup>1</sup> · DS(W, d) otherwise. Then, Lemma 3 formalizes very large and very small in Item 1 and Item 2, respectively, w.r.t. recoverable gaps. As for notation, given a sequence A, let A[...i] denote its i-length prefix:

**Lemma 3.** Let μ > 0 be the integer upper bound, d > 1 be the discount factor. Let <sup>v</sup> <sup>∈</sup> <sup>Q</sup> be the threshold value s.t. <sup>v</sup> <sup>=</sup> <sup>v</sup>[0] ...v[m](v[<sup>m</sup> + 1] ...v[n])<sup>ω</sup>. Let <sup>W</sup> be a non-empty, bounded, finite-length weight-sequence.


Proof. We present proof of one direction of Item 1. The others follow similarly. Let <sup>W</sup> be s.t for every infinite-length, bounded <sup>Y</sup> , DS(<sup>W</sup> · Y,d) > v holds. Then DS(W, d) + <sup>1</sup> <sup>d</sup>|W<sup>|</sup> · DS(Y, d) <sup>≥</sup> DS(v[···|W|] · <sup>v</sup>[|W|··· ], d) implies DS(W, d) <sup>−</sup> DS(v[···|W|], d) <sup>&</sup>gt; <sup>1</sup> <sup>d</sup>|W<sup>|</sup> · (DS(v[|W|··· ], d) <sup>−</sup> DS(Y, d)) implies gap(<sup>W</sup> <sup>−</sup> <sup>v</sup>[···|W|], d) <sup>&</sup>gt; <sup>1</sup> <sup>d</sup> (DS(v[|W|··· ], d) + <sup>μ</sup>·<sup>d</sup> <sup>d</sup>−<sup>1</sup> ).

This segues into the state-space of the B¨uchi automaton. We define the state space so that state s represents the gap value s. The idea is that all finite-length weight sequences with gap value s will terminate in state s. To assign transition between these states, we observe that gap value is defined inductively as follows: gap(ε, d) = 0 and gap(<sup>W</sup> ·w, d) = <sup>d</sup>·gap(W, d)+w, where <sup>w</sup> ∈ {−μ, . . . , μ}. Thus there is a transition from state <sup>s</sup> to state <sup>t</sup> on <sup>a</sup> ∈ {−μ, . . . , μ} if <sup>t</sup> <sup>=</sup> <sup>d</sup> · <sup>s</sup> <sup>+</sup> <sup>a</sup>. Since gap(ε, d) = 0, state 0 is assigned to be the initial state.

The issue with this construction is it has infinite states. To limit that, we use Lemma 3. Since Item 1 is a necessary and sufficient criteria for bad prefixes of safety language L≤, all states with value larger than Item 1 are fused into one non-accepting sink. For the same reason, all states with gap value less than Item 1 are accepting states. Due to Item 2, all states with value less than Item 2 are fused into one accepting sink. Finally, since d is an integer, gap values are integral. Thus, there are only finitely many states between Item 2 and Item 1.

**Theorem 6.** Let μ > 0 be an integer upper bound, d > 1 an integer discount factor, <sup>R</sup> an equality or inequality relation, and <sup>v</sup> <sup>∈</sup> <sup>Q</sup> the threshold value with an n-length representation given by v = v[0]v[1] ...v[m](v[m + 1]v[m + 2] ...v[n])<sup>ω</sup>.


Proof. To prove Item 1 we present the construction of an ω-regular comparator automaton for integer upper bound μ > 0, integer discount factor d > 1, inequality relation <sup>≤</sup>, and threshold value <sup>v</sup> <sup>∈</sup> <sup>Q</sup> s.t. <sup>v</sup> <sup>=</sup> <sup>v</sup>[0]v[1] ...v[m](v[<sup>m</sup> <sup>+</sup> 1]v[<sup>m</sup> + 2] ...v[n])<sup>ω</sup>. , denoted by <sup>A</sup> = (S, s<sup>I</sup> , Σ, δ, <sup>F</sup>) where:

$$\begin{aligned} \text{For } i \in \{0, \dots, n\}, &\text{ let } \mathbb{U}\_i = \frac{1}{d} \cdot DS(v[i \cdot \cdot], d) + \frac{\mu}{d - 1} \text{ (Lemma 3, Item 1)}\\ \text{For } i \in \{0, \dots, n\}, &\text{ let } \mathbb{L}\_i = \frac{1}{d} \cdot DS(v[i \cdot \cdot], d) - \frac{\mu}{d - 1} \text{ (Lemma 3, Item 2)} \end{aligned}$$

	- 1. If <sup>s</sup> ∈ {bad, veryGood}, then <sup>t</sup> <sup>=</sup> <sup>s</sup> for all <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>
	- 2. If <sup>s</sup> is of the form (p, i), and <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>
		- (a) If <sup>d</sup> · <sup>p</sup> <sup>+</sup> <sup>a</sup> <sup>−</sup> <sup>v</sup>[i] <sup>&</sup>gt; Ui, then <sup>t</sup> <sup>=</sup> bad
		- (b) If <sup>d</sup> · <sup>p</sup> <sup>+</sup> <sup>a</sup> <sup>−</sup> <sup>v</sup>[i] ≤ Li, then <sup>t</sup> <sup>=</sup> veryGood
		- (c) If Li < d · <sup>p</sup> <sup>+</sup> <sup>a</sup> <sup>−</sup> <sup>v</sup>[i] ≤ Ui, i. If <sup>i</sup> == <sup>n</sup>, then <sup>t</sup> = (<sup>d</sup> · <sup>p</sup> <sup>+</sup> <sup>a</sup> <sup>−</sup> <sup>v</sup>[i], m + 1) ii. Else, <sup>t</sup> = (<sup>d</sup> · <sup>p</sup> <sup>+</sup> <sup>a</sup> <sup>−</sup> <sup>v</sup>[i], i + 1)

We skip proof of correctness as it follows from the above discussion. Observe, A is deterministic. It is a safety automaton as all non-accepting states are sinks.

To prove Item 2, observe that since the comparator for ≤ is a deterministic safety automaton, the comparator for > is obtained by simply flipping the accepting and non-accepting states. This is a co-safety automaton of the same size. One can argue similarly for the remaining relations.

#### **4.2 Satisficing via safety and reachability games**

This section describes our comparator-based linear-time algorithm for satisficing for integer discount factors.

As described earlier, given discount factor d > 1, a play is winning for satisficing with threshold value <sup>v</sup> <sup>∈</sup> <sup>Q</sup> and relation <sup>R</sup> if its cost sequence <sup>A</sup> satisfies DS(A, d) R v. We now know from Theorem 6, that the winning condition for plays can be expressed as a safety or co-safety automaton for any <sup>v</sup> <sup>∈</sup> <sup>Q</sup> as long as the discount factor is an integer. Therefore, a synchronized product of the quantitative game with the safety or co-safety comparator denoting the winning condition completes the reduction to a safety or reachability game, respectively.

**Theorem 7.** Let G = (V,vinit,E, γ) be a quantitative game, d > 1 the integer discount factor, <sup>R</sup> the equality or inequality relation, and <sup>v</sup> <sup>∈</sup> <sup>Q</sup> the threshold value with an n-length representation. Let μ > 0 be the maximum of absolute values of costs along transitions in G. Then,


Proof. The first two points use a standard synchronized product argument on the following formal reduction [15]: Let <sup>G</sup> = (<sup>V</sup> <sup>=</sup> <sup>V</sup>0V1, vinit,E, γ) be a quantitative game, d > 1 the integer discount factor, R the equality or inequality relation, and <sup>v</sup> <sup>∈</sup> <sup>Q</sup> the threshold value with an <sup>n</sup>-length representation. Let μ > 0 be the maximum of absolute values of costs along transitions in G. Then, the first step is to construct the safety/co-safety comparator <sup>A</sup> = (S, s<sup>I</sup> , Σ, δ, <sup>F</sup>) for <sup>μ</sup>, <sup>d</sup>, <sup>R</sup> and <sup>v</sup>. The next is to synchronize the product of <sup>G</sup> and <sup>A</sup> over weights to construct the game GA = (<sup>W</sup> <sup>=</sup> <sup>W</sup><sup>0</sup> <sup>∪</sup> <sup>W</sup>1, s<sup>0</sup> <sup>×</sup> init, δ<sup>W</sup> , <sup>F</sup><sup>W</sup> ), where


We need the size of GA to analyze the worst-case complexity. Clearly, GA consists of <sup>O</sup>(|<sup>V</sup> | · <sup>μ</sup> · <sup>n</sup>) states. To establish the number of transitions in GA, observe that every state (v, s) in GA has the same number of outgoing edges as state <sup>v</sup> in <sup>G</sup> because the comparator <sup>A</sup> is deterministic. Since GA has <sup>O</sup>(<sup>μ</sup> · <sup>n</sup>) copies of every state <sup>v</sup> <sup>∈</sup> <sup>G</sup>, there are a total of <sup>O</sup>(|E| · <sup>μ</sup> · <sup>n</sup>) transitions in GA. Since GA is either a safety or a reachability game, it is solved in linear-time to its size. Thus, the overall complexity is <sup>O</sup>((|<sup>V</sup> <sup>|</sup> <sup>+</sup> <sup>|</sup>E|) · <sup>μ</sup> · <sup>n</sup>).

With respect to the value μ, the VI-based solutions are logarithmic in the worst case, while comparator-based solution is linear due to the size of the comparator. From a practical perspective, this may not be a limitation since weights along transitions can be scaled down. The parameter that cannot be altered is the size of the quantitative game. With respect to that, the comparator-based

**Fig. 2.** Cactus plot. μ = 5, v = 3. Total benchmarks = 291 **Fig. 3.** Single counter scalable benchmark. μ = 5, v = 3. Timeout = 500s.

solution displays clear superiority. Finally, the comparator-based solution is affected by n, length of the representation of the threshold value while the VI-based solution does not. It is natural to assume that the value of n is small.

#### **4.3 Implementation and Empirical Evaluation**

The goal of the empirical analysis is to determine whether the practical performance of these algorithms resonate with our theoretical discoveries.

For an apples-to-apples comparison, we implement three algorithms: (a) VIOptimal: Optimization via value-iteration, (b)VISatisfice: Satisficing via valueiteration, and (c). CompSatisfice: Satisficing via comparators. All tools have been implemented in C++. To avoid floating-point errors in VIOptimal and VISatisfice, the tools invoke the open-source GMP (GNU Multi-Precision) [2]. Since all arithmetic operations in CompSatisfice are integral only, it does not use GMP.

To avoid completely randomized benchmarks, we create ∼290 benchmarks from LTL<sup>f</sup> benchmark suite [29]. The state-of-the-art LTLf-to-automaton tool Lisa [8] is used to convert LTL<sup>f</sup> to (non-quantitative) graph games. Weights are randomly assigned to transitions. The number of states in our benchmarks range from 3 to 50000+. Discount factor <sup>d</sup> = 2, threshold <sup>v</sup> <sup>∈</sup> [0 <sup>−</sup> 10]. Experiments were run on 8 CPU cores at 2.4GHz, 16GB RAM on a 64-bit Linux machine.

**Observations and Inferences** Overall, we see that VISatisfice is efficient and scalable, and exhibits steady and predictable performance.

CompSatisfice outperforms VIOptimal in both runtime and number of benchmarks solved, as shown in Fig 2. It is crucial to note that all benchmarks solved by VIOptimal had fewer than 200 states. In contrast, CompSatisfice solves much larger benchmarks with 3-50000+ number of states.

To test scalability, we compared both tools on a set of scalable benchmarks. For integer parameter i > 0, the <sup>i</sup>-th scalable benchmark has 3 · <sup>2</sup><sup>i</sup> states. Fig 3

**Fig. 4.** Robustness. Fix benchmark, vary v. μ = 5. Timeout = 500s.

plots number-of-states to runtime in log-log scale. Therefore, the slope of the straight line will indicate the degree of polynomial (in practice). It shows us that CompSatisfice exhibits linear behavior (slope ∼1), whereas VIOptimal is much more expensive (slope >> 1) even in practice.

CompSatisfice is more robust than VISatisfice. We compare CompSatisfice and VISatisfice as the threshold value changes. This experiment is chosen due to Theorem 4 which proves that VISatisfice is non-robust. As shown in Fig 4, the variance in performance of VISatisfice is very high. The appearance of peak close to the optimal value is an empirical demonstration of Theorem 4. On that other hand, CompSatisfice stays steady in performance owning to its low complexity.

### **5 Adding Temporally Extended Goals**

Having witnessed algorithmic improvements of comparator-based satisficing over VI-based algorithms, we now shift focus to the question of applicability. While this section examines this with respect to the ability to extend to temporal goals, this discussion highlights a core strength of comparator-based reasoning in satisficing and shows its promise in a broader variety of problems.

The problem of extending optimal/satisficing solutions with a temporal goal is to determine whether there exists an optimal/satisficing solution that also satisfies a given temporal goal. Formally, given a quantitative game G, a labeling function <sup>L</sup> : <sup>V</sup> <sup>→</sup> <sup>2</sup>AP which assigns states <sup>V</sup> of <sup>G</sup> to atomic propositions from the set AP, and a temporal goal ϕ over AP, we say a play ρ = v0v<sup>1</sup> ... satisfies <sup>ϕ</sup> if its proposition sequence given by <sup>L</sup>(v0)L(v1)... satisfies the formula <sup>ϕ</sup>. Then to solve optimization/satisficing with a temporal goal is to determine if there exists a solutions that is optimal/satisficing and also satisfies the temporal goal along resulting plays. Prior work has proven that the optimization problem cannot be extended to temporal goals [13] unless the temporal goals are very simple safety properties [10,31]. In contrast, our comparator-based solution for satisficing can naturally be extended to temporal goals, in fact to all ω-regular properties, owing to its automata-based underpinnings, as shown below:

**Theorem 8.** Let <sup>G</sup> a quantitative game with state set <sup>V</sup> , <sup>L</sup> : <sup>V</sup> <sup>→</sup> <sup>2</sup>AP be a labeling function over set of atomic propositions AP , and ϕ be a temporal goal over AP and <sup>A</sup><sup>ϕ</sup> be its equivalent deterministic parity automaton. Let d > <sup>1</sup> be an integer discount factor, μ be the maximum of the absolute values of costs along transitions, and <sup>v</sup> <sup>∈</sup> <sup>Q</sup> be the threshold value with an <sup>n</sup>-length representation. Then, solving satisficing with temporal goals reduces to solving a parity game of size linear in <sup>|</sup><sup>V</sup> <sup>|</sup>, <sup>μ</sup>, <sup>n</sup> and |Aϕ|.

Proof. The reduction involves two steps of synchronized products. The first reduces the satisficing problem to a safety/reachability game while preserving the labelling function. The second synchronization product is between the safety/reachability game with the DPA Aϕ. These will synchronize on the atomic propositions in the labeling function and DPA transitions, respectively. Therefore, resulting parity game will be linear in <sup>|</sup><sup>V</sup> <sup>|</sup>, <sup>μ</sup> and <sup>n</sup>, and |Aϕ|.

Broadly speaking, our ability to solve satisficing via automata-based methods is a key feature as it propels a seamless integration of quantitative properties (threshold bounds) with qualitative properties, as both are grounded in automata-based methods. VI-based solutions are inhibited to do so since numerical methods are known to not combine well with automata-based methods which are so prominent with qualitative reasoning [5,20]. This key feature could be exploited in several other problems to show further benefits of comparator-based satisficing over optimization and VI-based methods.

### **6 Concluding remarks**

This work introduces the satisficing problem for quantitative games with the discounted-sum cost model. When the discount factor is an integer, we present a comparator-based solution for satisficing, which exhibits algorithmic improvements – better worst-case complexity and efficient, scalable, and robust performance – as well as broader applicability over traditional solutions based on numerical approaches for satisficing and optimization. Other technical contributions include the presentation of the missing proof of value-iteration for optimization and the extension of comparator automata to enable direct comparison to arbitrary threshold values as opposed to zero threshold value only.

An undercurrent of our comparator-based approach for satisficing is that it offers an automata-based replacement to traditional numerical methods. By doing so, it paves a way to combine quantitative and qualitative reasoning without compromising on theoretical guarantees or even performance. This motivates tackling more challenging problems in this area, such as more complex environments, variability in information availability, and their combinations.

**Acknowledgements.** We thank anonymous reviewers for valuable inputs. This work is supported in part by NSF grant 2030859 to the CRA for the CIFellows Project, NSF grants IIS-1527668, CCF-1704883, IIS-1830549, the ERC CoG 863818 (ForM-SMArt), and an award from the Maryland Procurement Office.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (ihttps://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Quasipolynomial Computation of Nested Fixpoints**

Daniel Hausmann (-) and Lutz Schr¨oder (-)

Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany {daniel.hausmann,lutz.schroeder}@fau.de

**Abstract.** It is well-known that the winning region of a parity game with n nodes and k priorities can be computed as a k-nested fixpoint of a suitable function; straightforward computation of this nested fixpoint requires <sup>O</sup>(<sup>n</sup> <sup>k</sup> <sup>2</sup> ) iterations of the function. Calude et al.'s recent quasipolynomial-time parity game solving algorithm essentially shows how to compute the same fixpoint in only quasipolynomially many iterations by reducing parity games to quasipolynomially sized safety games. Universal graphs have been used to modularize this transformation of parity games to equivalent safety games that are obtained by combining the original game with a universal graph. We show that this approach naturally generalizes to the computation of solutions of systems of *any* fixpoint equations over finite lattices; hence, the solution of fixpoint equation systems can be computed by quasipolynomially many iterations of the equations. We present applications to modal fixpoint logics and games beyond relational semantics. For instance, the model checking problems for the energy μ-calculus, finite latticed μ-calculi, and the graded and the (two-valued) probabilistic μ-calculus – with numbers coded in binary – can be solved via nested fixpoints of functions that differ substantially from the function for parity games but still can be computed in quasipolynomial time; our result hence implies that model checking for these μ-calculi is in QP. Moreover, we improve the exponent in known exponential bounds on satisfiability checking.

**Keywords:** Fixpoint theory, model checking, satisfiability checking, parity games, energy games, μ-calculus

### **1 Introduction**

Fixpoints are pervasive in computer science, governing large portions of recursion theory, concurrency theory, logic, and game theory. One famous example are parity games, which are central, e.g., to networks and infinite processes [5], tree automata [43], and μ-calculus model checking [17]. Winning regions in parity games can be expressed as nested fixpoints of particular set functions (e.g. [8,16]). In recent breakthrough work on the solution of parity games in quasipolynomial

Work forms part of the DFG-funded project CoMoC (SCHR 1118/15-1, MI 717/7-1).

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 38–56, 2021. https://doi.org/10.1007/978-3-030-72016-2 3

time, Calude et al. [9] essentially show how to compute this particular fixpoint in quasipolynomial time, that is, in time 2O((log <sup>n</sup>)c) for some constant c. Subsequently, it has been shown [13,14,28] that universal graphs (that is, even graphs into which every even graph of a certain size embeds by a graph morphism) can be used to transform parity games to equivalent safety games obtained by pairing the original game with a universal graph; the size of these safety games is determined by the size of the employed universal graphs and it has been shown [13,14] that there are universal graphs of quasipolynomial size. This yields a uniform algorithm for solving parity games to which all currently known quasipolynomial algorithms for parity games have been shown to instantiate using appropriately defined universal graphs [13, 14].

Briefly, our contribution in the present work is to show that the method of using universal graphs to solve parity games generalizes to the computation of nested fixpoints of arbitrary functions over finite lattices. That is, given functions <sup>f</sup><sup>i</sup> : <sup>P</sup>(U)<sup>k</sup>+1 → P(U), 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup> on a finite lattice <sup>U</sup>, we give an algorithm that uses universal graphs to compute the solutions of systems of equations

$$X\_i =\_{\eta\_i} f\_i(X\_0, \dots, X\_k) \qquad \qquad 0 \le i \le k$$

where η<sup>i</sup> = GFP (greatest fixpoint) or η<sup>i</sup> = LFP (least fixpoint). Since there are universal graphs of quasipolynomial size, the algorithm requires only quasipolynomially many iterations of the functions f<sup>i</sup> and hence runs in quasipolynomial time, provided that all f<sup>i</sup> are computable in quasipolynomial time. While it seems plausible that this time bound may also be obtained by translating equation systems to equivalent standard parity games by emulating Turing machines to encode the functions f<sup>i</sup> as Boolean circuits (leading to many additional states but avoiding exponential blowup during the process), we emphasize that the main point of our result is not so much the ensuing time bound but rather the insight that universal graphs and hence many algorithms for parity games can be used on a much more general level which yields a precise (and relatively low) quasipolynomial bound on the number of function calls that are required to obtain solutions of fixpoint equation systems.

In more detail, the method of Calude et al. can be described as annotating nodes of a parity game with histories of quasipolynomial size and then solving this annotated game, but with a safety winning condition instead of the much more involved parity winning condition. It has been shown that these histories can be seen as nodes in universal graphs, in a more general reduction of parity games to safety games in which nodes from the parity game are annotated with nodes from a universal graph. This method has also been described as pairing separating automata with safety games [14]. It has been shown [13,14] that there are exponentially sized universal graphs (essentially yielding the basis for e.g. the fixpoint iteration algorithm [8] or the small progress measures algorithm [27]) and quasipolynomially sized universal graphs (corresponding, e.g., to the succinct progress measure algorithm [28], or to the recent quasipolynomial variant of Zielonka's algorithm [38]).

Hasuo et al. [22], and more generally, Baldan et al. [4] show that nested fixpoints in highly general settings can be computed by a technique based on progress measures, implicitly using exponentially sized universal graphs, obtaining an exponential bound on the number of iterations. Our technique is based on showing that one can make explicit use of universal graphs, correspondingly obtaining a quasipolynomial upper bound on the number of iterations. In both cases, computation of the nested fixpoint is reduced to a single (least or greatest depending on exact formulation) fixpoint of a function that extends the given set function to keep track of the exponential and quasipolynomial histories, respectively, in analogy to the previous reduction of parity games to safety games. Our central result can then be phrased as saying that the method of transforming parity conditions to safety conditions using universal graphs generalizes from solving parity games to solving systems of equations that use arbitrary functions over finite lattices. We use fixpoint games [4, 42] to obtain the crucial result that the solutions of equation systems have history-free witnesses, in analogy to history-freeness of winning strategies in parity games. These fixpoint games have exponential size but we show how to extract polynomial-size witnesses for winning strategies of Eloise, and use these witnesses to show that any node won by Eloise is also won in the safety game obtained by a universal graph. For the backwards direction, we show that a witness for satisfaction of the safety condition regarding the universal graph induces a winning strategy in the fixpoint game. This proves that universal graphs can be used to compute nested fixpoints of arbitrary functions over finite lattices and hence yields the quasipolynomial upper bound for computation of nested fixpoints. Moreover, we present a progress measure algorithm that uses the nodes of a quasipolynomial universal graph to measure progress and that can be used to efficiently compute nested fixpoints of arbitrary functions over finite lattices.

As an immediate application of these results, we improve known deterministic algorithms for solving energy parity games [10], that is, parity games in which edges have additional integer weights and for which the winning condition is a combined parity condition and a (quantitative) positivity condition on the sum of the accumulated weights. Our results also show that the model checking problem for the associated energy μ-calculus [2] is in QP. In a similar fashion, we obtain quasipolynomial algorithms for model checking in latticed μ-calculi [7] in which the truth values of formulae are computed over arbitrary finite lattices, and for solving associated latticed parity games [30].

Furthermore, our results improve generic upper complexity bounds on model checking and satisfiability checking in the coalgebraic μ-calculus [12], which serves as a generic framework for fixpoint logics beyond relational semantics. Well-known instances of the coalgebraic μ-calculus include the alternatingtime μ-calculus [1], the graded μ-calculus [32], the (two-valued) probabilistic μ-calculus [12, 34], and the monotone μ-calculus [18] (the ambient fixpoint logic of concurrent dynamic logic CPDL [39] and Parikh's game logic [37]). This level of generality is achieved by abstracting system types as set functors and systems as coalgebras for the given functor following the paradigm of universal coalgebra [40]. It was previously shown [24] that the model checking problem for coalgebraic μ-calculi reduces to the computation of a nested fixpoint. This fixpoint may be seen as a coalgebraic generalization of a parity game winning region but can be literally phrased in terms of small standard parity games (implying quasipolynomial run time) only in restricted cases. Our results show that the relevant nested fixpoint can be computed in quasipolynomial time in all cases of interest. Notably, we thus obtain as new specific upper bounds that even under binary coding of numbers, the model checking problems of both the graded μ-calculus and the probabilistic μ-calculus are in QP, even when the syntax is extended to allow for (monotone) polynomial inequalities.

Similarly, the satisfiability problem of the coalgebraic μ-calculus has been reduced to a computation of a nested fixpoint [25], and our present results imply a marked improvement in the exponent of the associated exponential time bound. Specifically, the nesting depth of the relevant fixpoint is exponentially smaller than the basis of the lattice. Our results imply that this fixpoint is computable in polynomial time so that the complexity of satisfiability checking in coalgebraic μ-calculi drops from 2O(n2k<sup>2</sup> log <sup>n</sup>) to 2O(nk log <sup>n</sup>) for formulae of size n and with alternation depth k.

Related Work The quasipolynomial bound on parity game solving has in the meantime been realized by a number of alternative algorithms. For instance, Jurdzinski and Lazic [28] use succinct progress measures to improve to quasilinear (instead of quasipolynomial) space; Fearnley et al. [19] similarly achieve quasilinear space. Lehtinen [33] and Boker and Lehtinen [6] present a quasipolynomial algorithm using register games. Parys [38] improves Zielonka's algorithm [43] to run in quasipolynomial time. In particular the last algorithm is of interest as an additional candidate for generalization to nested fixpoints, due to the known good performance of Zielonka's algorithm in practice. Daviaud et al. [15] generalize quasipolynomial-time parity game solving by providing a pseudoquasipolynomial algorithm for mean-payoff parity games. On the other hand, Czerwinski et al. [14] give a quasipolynomial lower bound on universal trees, implying a barrier for prospective polynomial-time parity game solving algorithms. Chatterjee et al. [11] describe a quasipolynomial time set-based symbolic algorithm for parity game solving that is parametric in a lift function that determines how ranks of nodes depend on the ranks of their successors, and thereby unifies the complexity and correctness analysis of various parity game algorithms. Although part of the parity game structure is encapsulated in a set operator CPre, the development is tied to standard parity games, e.g. in the definition of the best function, which picks minimal or maximal ranks of successors depending on whether a node belongs to Abelard or Eloise.

Early work on the computation of unrestricted nested fixpoints has shown that greatest fixpoints require less effort in the fixpoint iteration algorithm, which can hence be optimized to compute nested fixpoints with just <sup>O</sup>(<sup>n</sup> <sup>k</sup> <sup>2</sup> ) calls of the functions at hand [35,41], improving the previously known (straightforward) bound <sup>O</sup>(n<sup>k</sup>); here, <sup>n</sup> denotes the size of the basis of the lattice and <sup>k</sup> the number of fixpoint operators. Recent progress in the field has established the abovementioned approaches using progress measures [22] and fixpoint games [4] in general settings, both with a view to applications in coalgebraic model checking

like in the present paper. In comparison to the present work, the respective bounds on the required number of function iterations in the above unrestricted approaches all are exponential.

A preprint of our present results, specifically the quasipolynomial upper bound on function iteration in fixpoint computation, has been available as an arXiv preprint for some time [23]. Subsequent to this preprint, Arnold, Niwinski and Parys [3] have improved the actual run time by reducing the overhead incurred per iteration (and they give a form of quasipolynomial lower bound for universal-tree-based algorithms), working (like [23]) in the less general setting of directly nested fixpoints over powerset lattices; we show in Section 6 how such an improvement can be incorporated also in our lattice-based algorithm.

### **2 Notation and Preliminaries**

Let <sup>U</sup> and <sup>V</sup> be sets, and let <sup>R</sup> <sup>⊆</sup> <sup>U</sup> <sup>×</sup> <sup>U</sup> be a binary relation on <sup>U</sup>. For <sup>u</sup> <sup>∈</sup> <sup>U</sup>, we then put <sup>R</sup>(u) := {<sup>v</sup> <sup>∈</sup> <sup>U</sup> <sup>|</sup> (u, v) <sup>∈</sup> <sup>R</sup>}. We put [k] = {0,...,k} for <sup>k</sup> <sup>∈</sup> <sup>N</sup>. Labelled graphs <sup>G</sup> = (W, R) consist of a set <sup>W</sup> together with a relation <sup>R</sup> <sup>⊆</sup> <sup>W</sup> <sup>×</sup> <sup>A</sup> <sup>×</sup> <sup>W</sup> where <sup>A</sup> is some set of labels; typically, we use <sup>A</sup> = [k] for some <sup>k</sup> <sup>∈</sup> <sup>N</sup>. An <sup>R</sup>-path in a labelled graph is a finite or infinite sequence <sup>v</sup>0, a0, v1, a1, v<sup>2</sup> ... (ending in a node from <sup>W</sup> if finite) such that (vi, ai, vi+1) <sup>∈</sup> <sup>R</sup> for all <sup>i</sup>. For <sup>v</sup> <sup>∈</sup> <sup>W</sup> and <sup>a</sup> <sup>∈</sup> <sup>A</sup>, we put <sup>R</sup>a(v) = {<sup>w</sup> <sup>∈</sup> <sup>W</sup> <sup>|</sup> (v, a, w) <sup>∈</sup> <sup>R</sup>} and sometimes write <sup>|</sup>G<sup>|</sup> to refer to <sup>|</sup>W|. As usual, we write <sup>U</sup><sup>∗</sup> and <sup>U</sup> <sup>ω</sup> for the sets of finite sequences or infinite sequences, respectively, of elements of U. The domain dom(f) of a partial function f : UV is the set of elements on which f is defined. As usual, the (forward) image of <sup>A</sup> <sup>⊆</sup> <sup>A</sup> under a function <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> is f[A ] = {<sup>b</sup> <sup>∈</sup> <sup>B</sup> | ∃<sup>a</sup> <sup>∈</sup> <sup>A</sup> . f(a) = <sup>b</sup>} and the preimage <sup>f</sup> <sup>−</sup>1[B ] of <sup>B</sup> <sup>⊆</sup> <sup>B</sup> under f is defined by f <sup>−</sup>1[B ] = {<sup>a</sup> <sup>∈</sup> <sup>A</sup> | ∃<sup>b</sup> <sup>∈</sup> <sup>B</sup> . f(a) = <sup>b</sup>}. Projections <sup>π</sup><sup>j</sup> : <sup>A</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>A</sup><sup>m</sup> <sup>→</sup> <sup>A</sup><sup>j</sup> for 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>m</sup> are given by <sup>π</sup>i(a1,...,am) = <sup>a</sup><sup>j</sup> . We often regard (finite) sequences <sup>τ</sup> <sup>=</sup> <sup>u</sup>0, u1,... <sup>∈</sup> <sup>U</sup><sup>∗</sup> <sup>∪</sup> <sup>U</sup> <sup>ω</sup> of elements of <sup>U</sup> as partial functions of type N U and then write τ (i) to denote the element ui, for <sup>i</sup> <sup>∈</sup> dom(<sup>τ</sup> ). For <sup>τ</sup> <sup>∈</sup> <sup>U</sup><sup>∗</sup> <sup>∪</sup> <sup>U</sup> <sup>ω</sup>, we define the set Inf(<sup>τ</sup> ) = {<sup>u</sup> <sup>∈</sup> <sup>U</sup> | ∀<sup>i</sup> <sup>≥</sup> <sup>0</sup>. <sup>∃</sup>j > i. τ (j) = <sup>u</sup>} of elements that occur infinitely often in <sup>τ</sup> (so Inf(<sup>τ</sup> ) = <sup>∅</sup> for <sup>τ</sup> <sup>∈</sup> <sup>U</sup>∗). An infinite <sup>R</sup>-path <sup>v</sup>0, p0, v1, p1,... in a labelled graph <sup>G</sup> = (W, R) with labels from [k] is even if max(Inf(p0, p1,...)) is even, and G is even if every infinite <sup>R</sup>-path in <sup>G</sup> is even. We write <sup>P</sup>(U) for the powerset of <sup>U</sup>, and <sup>U</sup><sup>m</sup> for the <sup>m</sup>-fold Cartesian product <sup>U</sup> ×···× <sup>U</sup>.

**Finite Lattices and Fixpoints** <sup>A</sup> finite lattice (L, ) (often written just as L) consists of a non-empty finite set <sup>L</sup> together with a partial order on L, such that there is, for all subsets <sup>X</sup> <sup>⊆</sup> <sup>L</sup>, a join <sup>X</sup> and a meet <sup>X</sup>. The least and greatest elements of <sup>L</sup> are defined as <sup>=</sup> <sup>∅</sup> and element <sup>=</sup> <sup>∅</sup>, respectively. A set <sup>B</sup><sup>L</sup> <sup>⊆</sup> <sup>L</sup> such that <sup>l</sup> <sup>=</sup> {<sup>b</sup> <sup>∈</sup> <sup>B</sup><sup>L</sup> <sup>|</sup> <sup>b</sup> <sup>l</sup>} is a basis of <sup>L</sup>. Given a finite lattice <sup>L</sup>, a function <sup>g</sup> : <sup>L</sup><sup>k</sup> <sup>→</sup> <sup>L</sup> is monotone if <sup>g</sup>(V1,...,Vk) g(W1,...,Wk) whenever <sup>V</sup><sup>i</sup> <sup>W</sup><sup>i</sup> for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>. For monotone <sup>f</sup> : <sup>L</sup> <sup>→</sup> <sup>L</sup>, we put

$$\mathsf{GFP}\,f = \bigsqcup \{ V \sqsubseteq L \mid V \sqsubseteq f(V) \} \qquad \mathsf{LFP}\,f = \bigsqcap \{ V \sqsubseteq L \mid f(V) \sqsubseteq V \},$$

which, by the Knaster-Tarski fixpoint theorem, are the greatest and the least fixpoint of f, respectively. Furthermore, we define f <sup>0</sup>(V ) = V and fm+1(V ) = <sup>f</sup>(fm(<sup>V</sup> )) for <sup>m</sup> <sup>≥</sup> 0, <sup>V</sup> <sup>L</sup>; since <sup>L</sup> is finite, we have GFP <sup>f</sup> <sup>=</sup> <sup>f</sup> <sup>n</sup>() and LFP <sup>f</sup> <sup>=</sup> <sup>f</sup> <sup>n</sup>(⊥) by Kleene's fixpoint theorem. Given a finite set <sup>U</sup> and a natural number <sup>n</sup>, (n<sup>U</sup> , ) is a finite lattice, where <sup>n</sup><sup>U</sup> <sup>=</sup> {<sup>f</sup> : <sup>U</sup> <sup>→</sup> [<sup>n</sup> <sup>−</sup> 1]} denotes the function space from <sup>U</sup> to [n−1] and <sup>f</sup> <sup>g</sup> if and only if for all <sup>u</sup> <sup>∈</sup> <sup>U</sup>, <sup>f</sup>(u) <sup>≤</sup> <sup>g</sup>(u). For <sup>n</sup> = 2, we obtain the powerset lattice (2<sup>U</sup> , <sup>⊆</sup>), also denoted by <sup>P</sup>(U), with least and greatest elements <sup>∅</sup> and <sup>U</sup>, respectively, and basis {{u} | <sup>u</sup> <sup>∈</sup> <sup>U</sup>}.

**Parity games** A parity game (V,E,Ω) consists of a set of nodes V , a left-total relation <sup>E</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> of moves encoding the rules of the game, and a priority function <sup>Ω</sup> : <sup>V</sup> <sup>→</sup> <sup>N</sup>, which assigns priorities <sup>Ω</sup>(v) <sup>∈</sup> <sup>N</sup> to nodes <sup>v</sup> <sup>∈</sup> <sup>V</sup> . Moreover, each node belongs to exactly one of the two players Eloise or Abelard, where we denote the set of Eloise's nodes by <sup>V</sup><sup>∃</sup> and that of Abelard's nodes by <sup>V</sup>∀. A play <sup>ρ</sup> <sup>∈</sup> <sup>V</sup> <sup>ω</sup> is an infinite sequence of nodes that follows the rules of the game, that is, such that for all <sup>i</sup> <sup>≥</sup> 0, we have (ρ(i), ρ(<sup>i</sup> + 1)) <sup>∈</sup> <sup>E</sup>. We say that an infinite play ρ = v0, v1,... is even if the largest priority that occurs infinitely often in it (i.e. max(Inf(<sup>Ω</sup> ◦ <sup>ρ</sup>))) is even, and odd otherwise, and call this property the parity of ρ. Player Eloise wins exactly the even plays and player Abelard wins all other plays. A (history-free) Eloise-strategy <sup>s</sup> : <sup>V</sup><sup>∃</sup> V is a partial function that assigns single moves <sup>s</sup>(x) to Eloise-nodes <sup>x</sup> <sup>∈</sup> dom(s). Given an Eloise-strategy <sup>s</sup>, a play <sup>ρ</sup> is an <sup>s</sup>-play if for all <sup>i</sup> <sup>∈</sup> dom(ρ) such that <sup>ρ</sup>(i) <sup>∈</sup> <sup>V</sup>∃, we have <sup>ρ</sup>(<sup>i</sup> + 1) = <sup>s</sup>(ρ(i)). An Eloise-strategy wins a node <sup>v</sup> <sup>∈</sup> <sup>V</sup> if Eloise wins all s-plays that start at v. We have a dual notion of Abelard-strategies; solving a parity game consists in computing the winning regions win<sup>∃</sup> and win<sup>∀</sup> of the two players, that is, the sets of states that they respectively win by some strategy.

It is known that solving parity games is in NP <sup>∩</sup> coNP (and, more specifically, in UP <sup>∩</sup> co-UP). Recently it has also been shown [9] that for parity games with <sup>n</sup> nodes and <sup>k</sup> priorities, win<sup>∃</sup> and win<sup>∀</sup> can be computed in quasipolynomial time <sup>O</sup>(nlog <sup>k</sup>+6). Another crucial property of parity games is that they are history-free determined [21], that is, that every node in a parity game is won by exactly one of the two players and then there is a history-free strategy for the respective player that wins the node.

### **3 Systems of Fixpoint Equations**

We now introduce our central notion, that is, systems of fixpoint equations over a finite lattice. Throughout, we fix a finite lattice (L, ) and a basis B<sup>L</sup> of L such that <sup>⊥</sup> <sup>∈</sup>/ <sup>B</sup>L, and <sup>k</sup> + 1 monotone functions <sup>f</sup><sup>i</sup> : <sup>L</sup>k+1 <sup>→</sup> <sup>L</sup>, 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>.

**Definition 3.1.** A system of equations consists of k + 1 equations of the form

$$X\_i =\_{\eta\_i} f\_i(X\_0, \dots, X\_k)$$

where <sup>η</sup><sup>i</sup> ∈ {LFP, GFP}, briefly referred to as <sup>f</sup>. For a partial valuation <sup>σ</sup> : [k] L, we inductively define

$$[X\_i]^\sigma = \eta\_i X\_i.f\_i^\sigma,$$

where the function f<sup>σ</sup> <sup>i</sup> is given by

$$f\_i^{\sigma}(A) = f\_i(\left\lbrack X\_0 \right\rbrack^{\sigma'}, \dots, \left\lbrack X\_{i-1} \right\rbrack^{\sigma'}, A, \mathbf{ev}(\sigma', i+1), \dots, \mathbf{ev}(\sigma', k))$$

for <sup>A</sup> <sup>∈</sup> <sup>L</sup>, where (σ[<sup>i</sup> <sup>→</sup> <sup>A</sup>])(j) = <sup>σ</sup>(j) for <sup>j</sup> <sup>=</sup> <sup>i</sup> and (σ[<sup>i</sup> <sup>→</sup> <sup>A</sup>])(i) = <sup>A</sup>, <sup>σ</sup> <sup>=</sup> <sup>σ</sup>[<sup>i</sup> <sup>→</sup> <sup>A</sup>] and where ev(σ, j) = <sup>σ</sup>(j) if <sup>j</sup> <sup>∈</sup> dom(σ) and ev(σ, j) = [[X<sup>j</sup> ]]<sup>σ</sup> otherwise (the latter clause handles free variables). Then, the solution of the system of equations is [[Xk]] where : [k] L denotes the empty valuation (i.e. dom() = <sup>∅</sup>). Similarly, we can obtain solutions for the other components as [[Xi]] for 0 <sup>≤</sup> i<k; we drop the valuation index if no confusion arises, and sometimes write [[Xi]]<sup>f</sup> to make the equation system f explicit. We denote by E<sup>f</sup><sup>0</sup> the solution [[Xk]] for the canonical system of equations of the particular shape

$$X\_i =\_{\eta\_i} X\_{i-1} \qquad \qquad \qquad X\_0 =\_{\mathsf{GFP}} f\_0(X\_0, \dots, X\_k),$$

where 0 < i <sup>≤</sup> <sup>k</sup>, <sup>η</sup><sup>i</sup> <sup>=</sup> LFP for odd <sup>i</sup> and <sup>η</sup><sup>i</sup> <sup>=</sup> GFP for even <sup>i</sup>.

**Example 3.2.** (1) Parity games and the modal μ-calculus: Let (V,E,Ω) be a parity game with priorities 0 to <sup>k</sup>, take <sup>L</sup> <sup>=</sup> <sup>P</sup>(<sup>V</sup> ), and consider the canonical system of fixpoint equations <sup>E</sup><sup>f</sup><sup>∃</sup> for the function <sup>f</sup><sup>∃</sup> : <sup>P</sup>(<sup>V</sup> )<sup>k</sup>+1 → P(<sup>V</sup> ) given by

$$f\_{\exists}(V\_0, \dots, V\_k) = \{v \in V\_{\exists} \mid E(v) \cap V\_{\Omega(v)} \neq \emptyset\} \cup \{v \in V\_{\forall} \mid E(v) \subseteq V\_{\Omega(v)}, \}$$

for (V0,...,Vk) ∈ P(<sup>V</sup> )<sup>k</sup>+1. It is well known that win<sup>∃</sup> <sup>=</sup> <sup>E</sup><sup>f</sup><sup>∃</sup> , i.e. parity games can be solved by solving fixpoint equation systems. Intuitively, <sup>v</sup> <sup>∈</sup> <sup>f</sup>∃(V0,...,Vk) iff Eloise can enforce that some node in VΩ(v) is reached in the next step. The nested fixpoint expressed by E<sup>f</sup><sup>∃</sup> (in which least (greatest) fixpoints correspond to odd (even) priorities) is constructed in such a way that Eloise only has to rely infinitely often on an argument V<sup>i</sup> for odd i if she can also ensure that some argument V<sup>j</sup> for j>i is used infinitely often.

Model checking for the modal μ-calculus [29] and solving parity games are linear-time equivalent problems. Formulae of the μ-calculus are evaluated over Kripke frames (U, R) with set of states U and transition relation R. Formulae φ of the μ-calculus can be directly represented as equation systems over the lattice <sup>P</sup>(U) by recursively translating <sup>φ</sup> to equations, mapping subformulae μXi. ψ(X0,...,Xk) and νX<sup>j</sup> . ψ(X0,...,Xk) to equations

$$X\_i =\_\mu \psi(X\_0, \dots, X\_k) \qquad \qquad X\_j =\_\nu \chi(X\_0, \dots, X\_k),$$

and interpreting the modalities ♦ and by functions

$$f\_{\triangleright}(X) = \{ u \in U \mid R(u) \cap X \neq \emptyset \} \qquad f\_{\square}(X) = \{ u \in U \mid R(u) \subseteq X \}$$

The solution of the resulting system of equations then is the truth set of the formula φ, that is, model checking for the model μ-calculus reduces to solving fixpoint equation systems. Furthermore, satisfiability checking for the modal μcalculus can be reduced to solving so-called satisfiability games [20], that is, parity games that are played over the set of states of a determinized parity automaton. These satisfiability games can be expressed as systems of fixpoint equations, where the functions track transitions in the determinized automaton.

(2) Energy parity games and the energy μ-calculus: Energy parity games [10] are two-player games played over weighted game arenas (V, E, w, Ω), where <sup>w</sup> : <sup>E</sup> <sup>→</sup> Z assigns integer weights to edges. The winning condition is the combination of a parity condition with a (quantitative) positivity condition on the sum of the accumulated weights. It has been shown [2, 10], that <sup>b</sup> <sup>=</sup> <sup>n</sup> · <sup>d</sup> · <sup>W</sup> is a sufficient upper bound on energy level accumulations in energy parity games with n nodes, k priorities and maximum absolute weight W. We define a function fe <sup>∃</sup> : ((b+ 1)<sup>V</sup> )<sup>k</sup>+1 <sup>→</sup> (b+ 1)<sup>V</sup> over the finite lattice (b+ 1)<sup>V</sup> (whose elements are functions from <sup>V</sup> to the set {0,...,b + 1}) by putting

$$(f\_{\exists}^{\mathbf{e}}(V\_0, \dots, V\_k))(v) = \begin{cases} \min(\mathbf{en}(v, V\_{\Omega(v)})) & \text{if } v \in V\_{\exists} \\ \max(\mathbf{en}(v, V\_{\Omega(v)})) & \text{if } v \in V\_{\forall} \end{cases}$$

for (V0,...,Vk) <sup>∈</sup> ((<sup>b</sup> + 1)<sup>V</sup> )<sup>k</sup>+1 and <sup>v</sup> <sup>∈</sup> <sup>V</sup> , using en(v, σ) as abbreviation for

$$\begin{aligned} \text{en}(v, \sigma) = \{ n \in \{0, \dots, b\} \mid \exists u \in E(v). n = \max\{0, \sigma(u) - w(v, u)\} \} \cup \\ \{b + 1 \mid \exists u \in E(v). \sigma(u) - w(v, u) > b \text{ or } \sigma(u) > b\}, \end{aligned}$$

where <sup>σ</sup> : <sup>V</sup> → {0,...,b + 1}. Then it follows from the results of [2] that player Eloise wins a node v in the energy parity game with minimal initial credit c<b+1 if (E<sup>f</sup><sup>e</sup> <sup>∃</sup> )(v) = c, that is, if the solution of the canonical equation system over f<sup>e</sup> ∃ maps v to a value c that is at most b.

The energy μ-calculus [2] is the fixpoint logic that corresponds to energy parity games. Its formulae are evaluated over weighted game structures and involve operators ♦Eφ and Eφ that are evaluated depending on the energy function [[φ]] : <sup>V</sup> → {0,...,b + 1} that is obtained by first evaluating the argument formula φ. The semantics of the diamond operator then is an energy function that assigns, to each state <sup>v</sup>, the least energy value <sup>c</sup> ∈ {0,...,b + 1} such that there is a move from v to some node u such that the credit c suffices to take the move from v to u and retain an energy level of at least [[φ]](u). Formulae can be translated to equation systems over the finite lattice (b + 1)<sup>V</sup> , where the functions for modal operators are defined according to their semantics as presented in [2]. Solving these equation systems then amounts to model checking energy μ-calculus formulae over weighted game structures.

(3) Latticed μ-calculi: In latticed μ-calculi [7], formulae are evaluated over complete lattices L rather than the powerset lattice; for finite lattices L, formulae of latticed μ-calculi hence can be translated to fixpoint equation systems over L, so that model checking reduces to solving equation systems. An associated latticed

variant of games has been introduced in [30] and for finite lattices L, solving latticed parity games over L reduces to solving equation systems over L.

(4) The coalgebraic μ-calculus and coalgebraic parity games: The coalgebraic μ-calculus [12] supports generalized modal branching types by using predicate liftings to interpret formulae over T-coalgebras, that is, over structures whose transition type is specified by an endofunctor T on the category of sets. For instance the functors <sup>T</sup> <sup>=</sup> <sup>P</sup>, <sup>T</sup> <sup>=</sup> <sup>D</sup> and <sup>T</sup> <sup>=</sup> <sup>G</sup> map sets <sup>X</sup> to their powerset <sup>P</sup>(X), the set of probability distributions <sup>D</sup>(X) = {<sup>f</sup> : <sup>X</sup> <sup>→</sup> [0,..., 1]} over <sup>X</sup>, and to the set of multisets <sup>G</sup>(X) = {<sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>N</sup>} over <sup>X</sup>, respectively. The corresponding <sup>T</sup>-coalgebras then are Kripke frames (for <sup>T</sup> <sup>=</sup> <sup>P</sup>), Markov chains (for <sup>T</sup> <sup>=</sup> <sup>D</sup>) and graded transition systems (for <sup>T</sup> <sup>=</sup> <sup>G</sup>), respectively. Instances of the coalgebraic μ-calculus comprise, e.g. the two-valued probabilistic <sup>μ</sup>-calculus [12, 34] with modalities ♦p<sup>φ</sup> for <sup>p</sup> <sup>∈</sup> [0,..., 1], expressing 'the next state satisfies φ with probability more than p'; the graded μ-calculus [32] with modalities ♦g<sup>φ</sup> for <sup>g</sup> <sup>∈</sup> <sup>N</sup>, expressing 'there are more than <sup>φ</sup> successor states that satisfy φ'; or the alternating-time μ-calculus [1] that is interpreted over concurrent game frames and uses modalities D<sup>φ</sup> for finite <sup>D</sup> <sup>⊆</sup> <sup>N</sup> (encoding a coalition) that express that 'coalition D has a joint strategy to enforce φ'.

It has been shown in previous work [24] that model checking for coalgebraic μ-calculi against coalgebras with state space U reduces to solving a canonical fixpoint equation system over the powerset lattice <sup>P</sup>(U), where the involved function interprets modal operators using predicate liftings, as described in [12, 24]. This canonical equation system can alternatively be seen as the winning region of Eloise in coalgebraic parity games, a highly general variant of parity games where the game structure is a coalgebra and nodes are annotated with modalities. Examples include two-valued probabilistic parity games and graded parity games in which nodes and edges are annotated with probabilities or grades, respectively. In order to win a node v, player Eloise then has to have a strategy that picks a set of moves to nodes that in turn are all won by Eloise, and such that the joint probability (joint grade) of the picked moves is greater than the probability (grade) that is assigned to v. It is known that solving coalgebraic parity games reduces to solving fixpoint equation systems [24].

Furthermore, the satisfiability problem of the coalgebraic μ-calculus has been reduced to solving canonical fixpoint equations systems over lattices <sup>P</sup>(U), where U is the state set of a determinized parity automaton and where the innermost equation checks for joint one-step satisfiability of sets of coalgebraic modalities [25]. By interpreting coalgebraic formulae over finite lattices d<sup>U</sup> rather than over powerset lattices, one obtains the finite-valued coalgebraic μ-calculus (with values {0,...,d}), which has the finite-valued probabilistic <sup>μ</sup>-calculus (e.g. [36]) as an instance. Model checking for the finite-valued probabilistic μ-calculus hence reduces to solving equation systems over the finite lattice d|U<sup>|</sup> , where {0,...,d} encodes a finite set of probabilities.

### **4 Fixpoint Games and History-free Witnesses**

We instantiate the existing notion of fixpoint games [4, 42], which characterize solutions of equation systems, to our setting (that is, to finite lattices), and then use these games as a technical tool to establish our crucial notion of historyfreeness for systems of fixpoint equations.

**Definition 4.1 (Fixpoint games).** Let <sup>X</sup><sup>i</sup> <sup>=</sup>η<sup>i</sup> <sup>f</sup>i(X0,...,Xk), 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, be a system of fixpoint equations. The associated fixpoint game is a parity game (V,E,Ω) with set of nodes <sup>V</sup> = (B<sup>L</sup> <sup>×</sup> [k]) <sup>∪</sup> <sup>L</sup>k+1, where nodes from <sup>B</sup><sup>L</sup> <sup>×</sup> [k] belong to player Eloise and nodes from Lk+1 belong to player Abelard. For nodes (u, i) <sup>∈</sup> <sup>B</sup><sup>L</sup> <sup>×</sup> [k], we put

$$E(u, i) = \{ (U\_0, \dots, U\_k) \in L^{k+1} \mid u \sqsubseteq f\_i(U\_0, \dots, U\_k) \},$$

and for nodes (U0,...,Uk) <sup>∈</sup> <sup>L</sup>k+1, we put

$$E(U\_0, \ldots, U\_k) = \{(u, i) \in B\_L \times [k] \mid u \sqsubseteq U\_i\}.$$

The alternation depth ad(i) of an equation X<sup>i</sup> =<sup>η</sup><sup>i</sup> fi(X0,...,X1) is defined as ad<sup>μ</sup> <sup>i</sup> if <sup>η</sup><sup>i</sup> <sup>=</sup> <sup>μ</sup> and as ad<sup>ν</sup> <sup>i</sup> if η<sup>i</sup> = ν, where ad<sup>μ</sup> <sup>i</sup> , ad<sup>ν</sup> <sup>i</sup> are recursively defined by

$$\mathsf{ad}\_{i}^{\mu} = \begin{cases} \mathsf{ad}\_{i-1}^{\mu} & i > 0, \eta\_{i-1} = \mu \\ \mathsf{ad}\_{i-1}^{\nu} + 1 & i > 0, \eta\_{i-1} = \nu \\ 1 & i = 0 \end{cases} \qquad \mathsf{ad}\_{i}^{\nu} = \begin{cases} \mathsf{ad}\_{i-1}^{\mu} + 1 & i > 0, \eta\_{i-1} = \mu \\ \mathsf{ad}\_{i-1}^{\nu} & i > 0, \eta\_{i-1} = \nu \\ 0 & i = 0 \end{cases}$$

for 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>. The priority function <sup>Ω</sup> : <sup>V</sup> <sup>→</sup> [ad(k)] then is defined by <sup>Ω</sup>(u, i) = ad(i) and Ω(U0,...,Uk) = 0.

**Remark 4.2.** In [4], an alternative priority function <sup>Ω</sup> : <sup>V</sup> <sup>→</sup> [2<sup>k</sup> + 1] with

$$\Omega'(u,i) = \begin{cases} 2i & \text{if } \eta\_i = \mathsf{GFP} \\ 2i+1 & \text{if } \eta\_i = \mathsf{LFP} \end{cases}$$

and Ω (U0,...,Uk) = 0 is used. Since ad(i) is even if and only if η<sup>i</sup> is even, and moreover ad(i) <sup>≤</sup> ad(j) for <sup>i</sup> <sup>≤</sup> <sup>j</sup>, and i<j whenever ad(i) <sup>&</sup>lt; ad(j), it is easy to see that Ω and Ω in fact assign identical parities to all plays. In the following, we will use the more economic parity function Ω so that fixpoint games have only <sup>d</sup> := ad(k) <sup>≤</sup> <sup>k</sup> priorities.

We import the associated characterization theorem [4, Theorem 4.8]:

**Theorem 4.3 ([4]).** We have <sup>u</sup> [[Xi]]<sup>f</sup> if and only if Eloise wins the node (u, i) in the fixpoint game for the given system f of equations.

**Remark 4.4.** While this shows that parity game solving can be used to solve equation systems, the size of fixpoint games is exponential in <sup>|</sup>BL|, so they do not directly yield a quasipolynomial algorithm for solving equation systems.

Next we define our notion of history-freeness for systems of fixpoint equations.

**Definition 4.5 (History-free witness).** <sup>A</sup> history-free witness for <sup>u</sup> [[Xi]]<sup>f</sup> is an even labelled graph (W, R) with labels from [d] such that <sup>W</sup> <sup>⊆</sup> <sup>B</sup><sup>L</sup> <sup>×</sup> [d], (u, i) <sup>∈</sup> <sup>W</sup>, and for all (v, p) <sup>∈</sup> <sup>W</sup>, we have <sup>v</sup> fp(U0,...,Uk) where <sup>U</sup><sup>j</sup> <sup>=</sup> <sup>π</sup>1[Rad(j)(v, p)] for 0 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup>, noting that <sup>R</sup>ad(j)(v, p) <sup>⊆</sup> <sup>W</sup> so that <sup>π</sup>1[Rad(j)(v, p)] <sup>⊆</sup> <sup>B</sup><sup>L</sup> and <sup>U</sup><sup>j</sup> <sup>∈</sup> <sup>L</sup>.

In analogy to history-free strategies for parity games, history-free witnesses assign tuples (R1(v, p),...,Rd(v, p)) of sets <sup>R</sup><sup>j</sup> (v, p) <sup>⊆</sup> <sup>W</sup> to pairs (v, p) <sup>∈</sup> <sup>W</sup> without relying on a history of previously visited pairs. We have <sup>|</sup>W| ≤ (<sup>d</sup> + 1)|BL<sup>|</sup> and <sup>|</sup>R| ≤ (<sup>d</sup> + 1)|W<sup>|</sup> <sup>2</sup>, that is, the size of history-free witnesses is polynomial in <sup>|</sup>BL|. Crucially, history-free witnesses always exist:

**Lemma 4.6.** For all <sup>u</sup> <sup>∈</sup> <sup>B</sup><sup>L</sup> and <sup>i</sup> <sup>∈</sup> [k], we have

<sup>u</sup> [[Xi]]<sup>f</sup> if and only if there is a history-free witness for <sup>u</sup> [[Xi]]<sup>f</sup> .

Proof. In one direction, we have <sup>u</sup> [[Xi]]<sup>f</sup> so that Eloise wins the node (u, i) in the according fixpoint game by Lemma 4.3. Let s be a corresponding historyfree winning strategy (such strategies always exists, see e.g. [21]). We inductively construct a witness for <sup>u</sup> [[Xi]]<sup>f</sup> , starting at (u, i). When at (v, p) <sup>∈</sup> <sup>B</sup><sup>L</sup> <sup>×</sup> [k] with s(v, p)=(U0,...,Uk), we put Ri(v, p) = <sup>j</sup>|ad(j)=i(U<sup>j</sup> × {j}) for 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>d</sup> and hence have ad(j) = <sup>i</sup> for all ((v, p), i,(u, j)) <sup>∈</sup> <sup>R</sup>. Since <sup>s</sup> is a winning strategy, the resulting graph (W, R) is a history-free witness for <sup>u</sup> [[Xi]]<sup>f</sup> by construction; in particular, (W, R) is even. For the converse direction, the witness for <sup>u</sup> [[Xi]]<sup>f</sup> directly yields a winning Eloise-strategy for the node (u, i) in the associated fixpoint game. This implies <sup>u</sup> [[Xi]]<sup>f</sup> by Lemma 4.3.

### **5 Solving Equation Systems using Universal Graphs**

We go on to prove our main result. To this end, we fix a system f of fixpoint equations <sup>f</sup><sup>i</sup> : <sup>L</sup>k+1 <sup>→</sup> <sup>L</sup>, 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, and put <sup>n</sup> := <sup>|</sup>BL<sup>|</sup> and <sup>d</sup> := ad(k) for the remainder of the paper.

**Definition 5.1 (Universal graphs [13, 14]).** Let G = (W, R) and G = (W , R ) be labelled graphs with labels from [d]. A homomorphism of labelled graphs from <sup>G</sup> to <sup>G</sup> is a function <sup>Φ</sup> : <sup>W</sup> <sup>→</sup> <sup>W</sup> such that for all (v, p, w) <sup>∈</sup> <sup>R</sup>, we have (Φ(v), p, Φ(w)) <sup>∈</sup> <sup>R</sup> . An (n, d + 1)-universal graph S is an even graph with labels from [d] such that for all even graphs G with labels from [d] and with <sup>|</sup>G| ≤ <sup>n</sup>, there is a homomorphism from <sup>G</sup> to <sup>S</sup>.

We fix an (n(d + 1),(d + 1))-universal graph S = (Z, K), noting that there are (n(d + 1),(d + 1))-universal graphs (obtained from universal trees) of size quasipolynomial in n and d [14]. We now combine the system f with the universal graph S to turn the parity conditions associated to general systems of fixpoint equations into a safety condition, associated to a single greatest fixpoint equation.

**Definition 5.2 (Chained-product fixpoint).** We define a function

$$\begin{aligned} g \colon \mathcal{P}(B\_L \times [k] \times Z) &\to \mathcal{P}(B\_L \times [k] \times Z) \\ U &\quad \mapsto \{(v, p, q) \in B\_L \times [k] \times Z \mid v \subseteq f\_p(P\_0^{U, q}, \dots, P\_k^{U, q})\} \end{aligned}$$

where

$$P\_i^{U,q} = \bigsqcup \{ u \in B\_L \mid \exists s \in K\_{\textup{ad}(i)}(q). (u, i, s) \in U \}.$$

We refer to Y<sup>0</sup> =GFP g(Y0) as the chained-product fixpoint (equation) of f and S.

We now show our central result: apart from the annotation with states from the universal graph, the chained-product fixpoint g is the solution of the system f.

**Theorem 5.3.** For all <sup>u</sup> <sup>∈</sup> <sup>B</sup><sup>L</sup> and <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, we have

$$\text{If } u \subseteq \{X\_i\}\_f \text{ if and only if there is } q \in Z \text{ such that } (u, i, q) \in \{Y\_0\}\_g.$$

Proof. For the forward direction, let <sup>u</sup> [[Xi]]<sup>f</sup> . By Lemma 4.6, there is a historyfree witness <sup>G</sup> = (W, R) for <sup>u</sup> [[Xi]]<sup>f</sup> . Since S is a (n(d + 1), d + 1)-universal graph and since G is a witness and hence an even labelled graph of suitable size <sup>|</sup>G| ≤ <sup>n</sup>(<sup>d</sup> + 1), there is a graph homomorphism <sup>Φ</sup> from <sup>G</sup> to <sup>S</sup>. Starting at (u, i, Φ(u, i), 0), we inductively construct a witness for containment of (u, i, Φ(u, i)) in [[Y0]]g. When at (v1, p1, Φ(v1, p1), 0) with (v1, p1) <sup>∈</sup> <sup>W</sup>, we put

$$\begin{aligned} R\_0'(v\_1, p\_1, \Phi(v\_1, p\_1), 0) &= \{ (v\_2, p\_2, \Phi(v\_2, p\_2), 0) \in B\_L \times [d] \times Z \times [0] \mid \\ &(v\_2, p\_2) \in R\_{\text{ad}(p\_2)}(v\_1, p\_1), \Phi(v\_2, p\_2) \in K\_{\text{ad}(p\_2)}(\Phi(v\_1, p\_1)) \} \end{aligned}$$

and continue the inductive construction with all these (v2, p2, Φ(v2, p2), 0), having (v2, p2) <sup>∈</sup> <sup>W</sup>. The resulting structure <sup>G</sup> = (W , R ) indeed is a witness for containment of (u, i, q) in [[Y0]]g: G is even by construction. Moreover, we need to show that for (v1, p1, Φ(v1, p1), 0) <sup>∈</sup> <sup>W</sup> , we have (v1, p1, Φ(v1, p1), 0) <sup>∈</sup> g(π1[R <sup>0</sup>(v1, p1, Φ(v1, p1), 0)]), i.e. <sup>v</sup><sup>1</sup> f<sup>p</sup><sup>1</sup> (P U,Φ(v1,p1) <sup>0</sup> ,...,P U,Φ(v1,p1) <sup>k</sup> ) where U = π1[R <sup>0</sup>(v1, p1, Φ(v1, p1), 0)]. Since <sup>G</sup> is a witness and (v1, p1) <sup>∈</sup> <sup>W</sup> by construction of W , we have <sup>v</sup><sup>1</sup> f<sup>p</sup><sup>1</sup> (U0,...,Uk) where U<sup>j</sup> = (π<sup>j</sup> [Rad(i)(v1, p1)]). By monotonicity of <sup>f</sup><sup>p</sup><sup>1</sup> , it thus suffices to show that <sup>U</sup><sup>j</sup> P U,Φ(v1,p1) <sup>j</sup> for <sup>0</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup>; by definition of <sup>P</sup> U,Φ(v1,p1) <sup>j</sup> this follows if

$$\pi\_1[R\_{\textup{ad}(j)}(v\_1, p\_1)] \subseteq \{ u \in B\_L \mid \exists s \in K\_{\textup{ad}(j)}(\Phi(v\_1, p\_1)).(u, j, s) \in W \},$$

where W = π1[R <sup>0</sup>(v1, p1, q1, 0)]. So let <sup>w</sup> <sup>∈</sup> <sup>B</sup><sup>L</sup> such that <sup>w</sup> <sup>∈</sup> <sup>π</sup>1[Rad(j)(v1, p1)]. Since R is a witness that is constructed as in the proof of Lemma 4.6, we have i = ad(i ) for all ((v , p ), i,(w , i )) <sup>∈</sup> <sup>R</sup>. Thus (w, j) <sup>∈</sup> <sup>R</sup>ad(j)(v1, p1) for some <sup>j</sup> such that ad(j) = <sup>i</sup>, that is, ((v1, p1), ad(j),(w, j)) <sup>∈</sup> <sup>R</sup>, hence (Φ(v1, p1), ad(j), Φ(w, j)) <sup>∈</sup> <sup>K</sup> because <sup>Φ</sup> is a graph homomorphism. By definition of R <sup>0</sup> we have (w, j, Φ(w, j), 0) <sup>∈</sup> <sup>R</sup> <sup>0</sup>(v1, p1, Φ(v1, p1), 0) so that (w, j, Φ(w, j)) <sup>∈</sup> <sup>π</sup>1[R <sup>0</sup>(v1, p1, Φ(v1, p1), 0)]. We are done since <sup>Φ</sup>(w, j) <sup>∈</sup> Kad(j)(Φ(v1, p1)).

For the converse implication, let (u0, p0, q0) <sup>∈</sup> [[Y0]]<sup>g</sup> for some <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Z</sup>. Let G = (W, R) be a history-free witness for this fact. By Lemma 4.3, it suffices to provide a strategy in the fixpoint game for the system f with which Eloise wins the node (u0, p0). We inductively construct a history-dependent strategy s as follows: For <sup>i</sup> <sup>≥</sup> 0, we abbreviate <sup>U</sup><sup>i</sup> <sup>=</sup> <sup>R</sup>0(ui, pi, qi, 0). We put <sup>s</sup>(u0, p0) = (P <sup>U</sup>0,q<sup>0</sup> <sup>0</sup> ,...,P <sup>U</sup>0,q<sup>0</sup> <sup>k</sup> ). For the inductive step, let

$$\tau = (u\_0, p\_0), (P\_0^{U\_0, q\_0}, \dots, P\_k^{U\_0, q\_0}), \dots, (P\_0^{U\_{n-1}, q\_{n-1}}, \dots, P\_k^{U\_{n-1}, q\_{n-1}}), (u\_n, p\_n)$$

be a partial play of the fixpoint game that follows the strategy that has been constructed so far. Then we have an R-path (u0, p0, q0, 0),(u1, p1, q1, 0),...,(un, pn, qn, 0), where, for 0 <sup>≤</sup> i<n, we have (qi, pi+1, qi+1) <sup>∈</sup> <sup>K</sup> since <sup>u</sup>i+1 P <sup>U</sup>i,q<sup>i</sup> <sup>p</sup>i+1 by the inductive construction. Put s(τ )=(P <sup>U</sup>n,q<sup>n</sup> <sup>0</sup> ,...,P <sup>U</sup>n,q<sup>n</sup> <sup>k</sup> ). Since <sup>G</sup> is a witness, the strategy uses only moves that are available to Eloise (i.e. ones with <sup>u</sup><sup>n</sup> f<sup>p</sup><sup>n</sup> (s(τ ))). Also, s is a winning strategy as can be seen by looking at the K-paths that are induced by complete plays τ that follow s, as described (for partial plays) above. Since S is a universal graph and hence even, every such K-path is even and the sequence of priorities in <sup>τ</sup> is just the sequence of priorities of one of these <sup>K</sup>-paths.

**Remark 5.4.** Since the set [[Y0]]<sup>g</sup> is the greatest fixpoint of g, it can be computed by simple approximation from above, that is, as <sup>g</sup><sup>m</sup>(B<sup>L</sup> <sup>×</sup> [k] <sup>×</sup> <sup>Z</sup>) where <sup>m</sup> <sup>=</sup> <sup>|</sup>B<sup>L</sup> <sup>×</sup> [k] <sup>×</sup> <sup>Z</sup>|. However, each iteration of the function <sup>g</sup> may require up to <sup>|</sup>Z<sup>|</sup> evaluations of an equation. In the next section, we will show how this additional iteration factor in the computation of [[Y0]]<sup>g</sup> can be avoided.

### **6 A Progress Measure Algorithm**

We next introduce a lifting algorithm that computes the set [[Y0]]<sup>g</sup> efficiently, following the paradigm of the progress measure approach for parity games (e.g. [27, 28]). Our progress measures will map pairs (u, i) <sup>∈</sup> <sup>B</sup><sup>L</sup> <sup>×</sup>[k] to nodes in a universal graph that is equipped with a simulation order, that is, a total order that is suitable for measuring progress.

**Definition 6.1 (Simulation order).** For natural numbers i, i , we put <sup>i</sup> <sup>i</sup> if and only if either i is even and i = i , or both i and i are odd and <sup>i</sup> <sup>≥</sup> <sup>i</sup> . A total order <sup>≤</sup> on <sup>Z</sup> is a simulation order if for all q, q <sup>∈</sup> <sup>Z</sup>,

$$\begin{aligned} q \le q' \text{ implies that for all } 0 \le i \le k \text{ and } s \in K\_i(q), \text{ there are } \\ i' \succeq i \text{ and } s' \in K\_{i'}(q') \text{ such that } s \le s'. \end{aligned}$$

**Lemma 6.2.** There is an (n(d + 1), d + 1)-universal graph (Z, K) of size quasipolynomial in <sup>n</sup> and <sup>d</sup>, and over which a simulation order <sup>≤</sup> exists.

Proof (Sketch). It has been shown [14, Theorem 2.2] (originally, in different terminology, [28]) that there are (l, h)-universal trees (a concept similar to, but slightly more concrete than universal graphs) with set of leaves T such that <sup>|</sup>T| ≤ <sup>2</sup><sup>l</sup> log <sup>l</sup>+h+1 h . Leaves in universal trees are identified by navigation paths, that is, sequences of branching directions, so that the leaves are linearly ordered by the lexicographic order ≤ on navigation paths (which orders leafs from the left to the right). As described in [13], one can obtain a universal graph (T,K) over T in which transitions (q, i, q ) <sup>∈</sup> <sup>K</sup> for odd <sup>i</sup> (the crucial case) move to the left, that is, q is a leaf that is to the left of q in the universal tree (so that q < q), ensuring universality. As it turns out, the lexicographic ordering on T is a simulation order. Adapting this construction to our setting, we put l = n(d + 1) and h = d + 1 and obtain a (n(d + 1), d + 1)-universal graph (along with a simulation order <sup>≤</sup>) of size at most 2n(<sup>d</sup> + 1) log(n(d+1))+d+2 <sup>d</sup>+1 which is quasipolynomial in <sup>n</sup> and <sup>d</sup>.

We fix an (n(d+ 1), d+ 1)-universal graph (Z, K) and a simulation order <sup>≤</sup> on <sup>Z</sup> for the remainder of the paper (these exist by the above lemma).

**Definition 6.3 (Progress measure, lifting function).** We let <sup>q</sup>min <sup>∈</sup> <sup>Z</sup> denote the least node w.r.t. <sup>≤</sup> and fix a distinguished top element /<sup>∈</sup> <sup>Z</sup>, and extend <sup>≥</sup> to <sup>Z</sup> ∪ {} by putting <sup>≥</sup> <sup>q</sup> for all <sup>q</sup> <sup>∈</sup> <sup>Z</sup>. A measure is a map <sup>μ</sup>: <sup>B</sup><sup>L</sup> <sup>×</sup> [k] <sup>→</sup> <sup>Z</sup> ∪ {}, i.e. assigns nodes in the universal graph or to pairs (v, p) <sup>∈</sup> <sup>B</sup><sup>L</sup> <sup>×</sup> [k]. A measure <sup>μ</sup> is a progress measure if whenever <sup>μ</sup>(v, p) <sup>=</sup> , then <sup>v</sup> fp(Uμ,q <sup>0</sup> ,...,Uμ,q <sup>k</sup> ) where <sup>q</sup> <sup>=</sup> <sup>μ</sup>(v, p) and

$$U\_i^{\mu, q} = \bigsqcup \{ u \in B\_L \mid \exists s \in K\_{\text{ad}(i)}(q) . \mu(u, i) \le s \}.$$

We define a function Lift : (B<sup>L</sup> <sup>×</sup> [k] <sup>→</sup> <sup>Z</sup> ∪ {}) <sup>→</sup> (B<sup>L</sup> <sup>×</sup> [k] <sup>→</sup> <sup>Z</sup> ∪ {}) on measures by

$$(\mathsf{Lift}(\mu))(v, p) = \min \{ q \in Z \mid v \sqsubseteq f\_p(U\_0^{\mu, q}, \dots, U\_k^{\mu, q}) \}.$$

where min(Z ) denotes the least element of <sup>Z</sup> w.r.t. <sup>≤</sup>, for ∅ <sup>=</sup> <sup>Z</sup> <sup>⊆</sup> <sup>Z</sup>; also we put min(∅) = .

The lifting algorithm then starts with the least measure mmin that maps all pairs (v, p) <sup>∈</sup> <sup>B</sup><sup>L</sup> <sup>×</sup> [k] to the minimal node (i.e. <sup>m</sup>min(v, p) = <sup>q</sup>min) and repeatedly updates the current measure using Lift until the measure stabilizes.

#### **Lifting algorithm**


**Lemma 6.4 (Correctness).** For all <sup>v</sup> <sup>∈</sup> <sup>B</sup><sup>L</sup> and <sup>0</sup> <sup>≤</sup> <sup>p</sup> <sup>≤</sup> <sup>k</sup>, we have

 $(v, p) \in \mathbb{E}$   $if$  and  $only$   $if$   $v \in \{X\_p\}\_f$ .

Proof (Sketch). Let μ denote the progress measure that the algorithm computes. For one direction of the proof, let (v, p) <sup>∈</sup> <sup>E</sup>. By Lemma 4.6 it suffices to construct a witness for <sup>v</sup> <sup>∈</sup> [[Xp]]<sup>f</sup> . We extract such a witness (E, R) from the progress measure <sup>μ</sup>, relying on the properties of the simulation order <sup>≤</sup> that is used to measure the progress of μ to ensure that any infinite sequence of measures that μ assigns to some R-path induces an infinite (and hence even) path in the employed universal graph. This shows that (E, R) indeed is an even graph and hence a witness. For the converse direction, let <sup>v</sup> <sup>∈</sup> [[Xp]]<sup>f</sup> so that there is, by Theorem 5.3, some <sup>q</sup> <sup>∈</sup> <sup>Z</sup> such that (v, p, q) <sup>∈</sup> [[Y0]]g. For (u, i) such that there is <sup>q</sup> <sup>∈</sup> <sup>Z</sup> such that (u, i, q ) <sup>∈</sup> [[Y0]]g, let <sup>q</sup>(u,i) <sup>∈</sup> <sup>Z</sup> denote the minimal such node w.r.t. <sup>≤</sup>. It now suffices that <sup>μ</sup>(u, i) <sup>≤</sup> <sup>q</sup>(u,i) for all such (u, i), which is shown by induction on the number of iterations of the lifting algorithm.

**Corollary 6.5.** Solutions of systems of fixpoint equations can be computed with quasipolynomially many evaluations of equations.

Proof. Given an (n(d + 1), d + 1)-universal graph (Z, K) and a simulation order on Z, the lifting algorithm terminates and returns the solution of f after at most <sup>n</sup>(<sup>d</sup> + 1) · |Z<sup>|</sup> many iterations. This is the case since each iteration (except the final iteration) increases the measure for at least one of the n(d + 1) nodes and the measure of each node can be increased at most <sup>|</sup>Z<sup>|</sup> times. Using the universal graph and the simulation order from the proof of Lemma 6.2, we have <sup>|</sup>Z| ≤ <sup>2</sup>n(<sup>d</sup> + 1) log(n(d+1))+d+2 <sup>d</sup>+1 so that the algorithm terminates after at most 2(n(d + 1))<sup>2</sup> log(n(d+1))+d+2 <sup>d</sup>+1 ∈ O((n(<sup>d</sup> + 1))log(d+1)) iterations of the function Lift. Each iteration can be implemented to run with at most n(d+ 1) evaluations of an equation.

**Corollary 6.6.** The number of function calls required for the solution of systems of fixpoint equations with <sup>d</sup> <sup>≤</sup> log <sup>n</sup> is bounded by a polynomial in <sup>n</sup> and <sup>d</sup>.

Proof. Following the insight of Theorem 2.8 in [9], Theorem 2.2. in [14] implies that if d < log n, then there is an (n(d+1), d+1)-universal tree of size polynomial in n and d. In the same way as in the proof of Lemma 6.2, one obtains a universal graph of polynomial size and a simulation order on it.

**Example 6.7.** Applying Corollary 6.5 and Corollary 6.6 to Example 3.2, we obtain the following results:

(1) The model checking problems for the energy μ-calculus and finite latticed μ-calculi are in QP. For energy parity games with sufficient upper bound b on energy level accumulations, we obtain a progress measure algorithm that terminates after a number of iterations that is quasipolynomial in b.

(2) Under mild assumptions on the modalities (see [24]), the model checking problem for the coalgebraic μ-calculus is in QP; in particular, this yields QP model checking algorithms for the graded μ-calculus and the two-valued probabilistic μ-calculus (equivalently: QP progress measure algorithms for solving graded and two-valued probabilistic parity games).

(3) Under mild assumptions on the modalities (see [25]), we obtain a novel upper bound 2O(nd log <sup>n</sup>) for the satisfiability problems of coalgebraic μ-calculi, in particular including the monotone μ-calculus, the alternating-time μ-calculus, the graded μ-calculus and the (two-valued) probabilistic μ-calculus, even when the latter two are extended with (monotone) polynomial inequalities. This improves on the best previous bounds in all cases.

### **7 Conclusion**

We have shown how to use universal graphs to compute solutions of systems of fixpoint equations X<sup>i</sup> = ηi. fi(X0,...,Xk) (with the η<sup>i</sup> marking least or greatest fixpoints) that use functions <sup>f</sup><sup>i</sup> : <sup>L</sup>k+1 <sup>→</sup> <sup>L</sup> (over a finite lattice <sup>L</sup> with basis BL) and involve up to k + 1-fold nesting of fixpoints. Our progress measure algorithm needs quasipolynomially many evaluations of equations, and runs in time <sup>O</sup>(<sup>q</sup> ·t(f)), where <sup>q</sup> is a quasipolynomial in <sup>|</sup>BL<sup>|</sup> and the alternation depth of the equation system, and where t(f) is an upper bound on the time it takes to compute f<sup>i</sup> for all i.

As a consequence of our results, the upper time bounds for the evaluation of various general parity conditions improve. Example domains beyond solving parity games to which our algorithm can be instantiated comprise model checking for latticed μ-calculi and solving latticed parity games [7, 30], solving energy parity games and model checking for the energy μ-calculus [2, 10], and model checking and satisfiability checking for the coalgebraic μ-calculus [12]. The resulting model checking algorithms for latticed μ-calculi and the energy μ-calculus run in time quasipolynomial in the provided basis of the respective lattice. In terms of concrete instances of the coalgebraic μ-calculus, we obtain, e.g., quasipolynomial-time model checking for the graded [32] and the probabilistic μ-calculus [12, 34] as new results (corresponding results for, e.g., the alternating-time μ-calculus [1] and the monotone μ-calculus [18] follow as well but have already been obtained in our previous work [24]), as well as improved upper bounds for satisfiability checking in the graded μ-calculus, the probabilistic μ-calculus, the monotone μ-calculus, and the alternating-time μ-calculus. We foresee further applications, e.g. in the computation of fair bisimulations and fair equivalence [26, 31] beyond relational systems, e.g. for probabilistic systems.

As in the case of parity games, a natural open question that remains is whether solutions of fixpoint equations can be computed in polynomial time (which would of course imply that parity games can be solved in polynomial time). A more immediate perspective for further investigation is to generalize the recent quasipolynomial variant [38] of Zielonka's algorithm [43] for solving parity games to solving systems of fixpoint equations, with a view to improving efficiency in practice.

### **References**

1. Alur, R., Henzinger, T., Kupferman, O.: Alternating-time temporal logic. J. ACM **49**, 672–713 (2002), https://doi.org/10.1145/585265.585270


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **SMT Verification**

# **A Flexible Proof Format for SAT Solver-Elaborator Communication**

Seulkee Baek (-) , Mario Carneiro , and Marijn J.H. Heule

Carnegie Mellon University, Pittsburgh, PA, United States {seulkeeb,mcarneir,mheule}@andrew.cmu.edu

**Abstract.** We introduce FRAT, a new proof format for unsatisfiable SAT problems, and its associated toolchain. Compared to DRAT, the FRAT format allows solvers to include more information in proofs to reduce the computational cost of subsequent elaboration to LRAT. The format is easy to parse forward and backward, and it is extensible to future proof methods. The provision of optional proof steps allows SAT solver developers to balance implementation effort against elaboration time, with little to no overhead on solver time. We benchmark our FRAT toolchain against a comparable DRAT toolchain and confirm >84% median reduction in elaboration time and >94% median decrease in peak memory usage.

**Keywords:** Satisfiability · Proof format · DRAT · LRAT · FRAT.

### **1 Introduction**

The Boolean satsifiability problem is the problem of determining, for a given Boolean formula consisting of Boolean variables and connectives, whether there exists a variable assignment under which the formula evaluates to true. Boolean satisfiability (SAT) is interesting in part because there are surprisingly diverse types of problems that can be encoded as Boolean formulas and solved efficiently by checking their satisfiability. SAT solvers, programs that automatically solve SAT problems, have been successfully applied to a wide range of areas, including hardware verification [2], planning [14], and combinatorics [12].

The performance of SAT solvers has taken great strides in recent years, and modern solvers can often solve problems involving millions of variables and clauses, which would have been unthinkable a mere 20 years ago [15]. But this improvement comes at the cost of significant increase in the code complexity of SAT solvers, which makes it difficult to either assume their correctness on faith, or certify their program correctness directly. As a result, the ability of SAT solvers to produce independently verifiable certificates has become a pressing necessity. Since there is an obvious certificate format (the satisfying boolean assignment) for satisfiable problems, the real challenge in proof-producing SAT

Partially supported by AFOSR grant FA9550-18-1-0120

Supported by the National Science Foundation under grant CCF-2010951

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 59–75, 2021. https://doi.org/10.1007/978-3-030-72016-2 4

solving is in devising a compact proof format for unsatisfiable problems, and developing a toolchain that efficiently produces and verifies it.

The current de facto standard proof format for unsatisfiable SAT problems is DRAT [10]. The format, as well as its predecessor DRUP, were designed with a strong focus on quick adaptation by the community, emphasizing easy proof emission, practically zero overhead, and reasonable validation speed [11]. The DRAT format has become the only supported proof format in SAT Competition and Races since 2014 due to entrants losing interest in alternatives.

DRAT is a clausal proof format [6], which means that a DRAT proof consists of a sequence of instructions for adding and deleting clauses. It is helpful to think of a DRAT proof as a program for modifying the 'active multiset' of clauses: the initial active multiset is the clauses of the input problem, and this multiset grows and shrinks over time as the program is executed step by step. The invariant throughout program execution is that the active multiset at any point of time is at least as satisfiable as the initial active multiset. This invariant holds trivially in the beginning and after a deletion; it is also preserved by addition steps by either RUP or RAT, which we explain shortly. The last step of a DRAT proof is the addition of the empty clause, which ensures the unsatisfiability of the final active multiset, and hence that of the initial active multiset, i.e. the input problem.

Every addition step in DRAT is either a reverse unit propagation (RUP) step [6] or a resolution asymmetric tautology (RAT) [13] step. A clause C has the property AT (asymmetric tautology) with respect to a formula <sup>F</sup> if F, <sup>C</sup> <sup>1</sup> <sup>⊥</sup>, which is to say, there is a proof of the empty clause by unit propagation using F and the negated literals in C. A RUP step that adds C to the active multiset F is valid if <sup>C</sup> has property AT with respect to <sup>F</sup>. A clause <sup>l</sup>∨<sup>C</sup> has property RAT with respect to <sup>F</sup> if for every clause <sup>l</sup> <sup>∨</sup> <sup>D</sup> <sup>∈</sup> <sup>F</sup>, the clause <sup>C</sup> <sup>∨</sup> <sup>D</sup> has property AT with respect to F. In this case, C is not logically entailed by F, but F and <sup>F</sup> <sup>∧</sup> <sup>C</sup> are equisatisfiable, and a RAT step will add <sup>C</sup> to the active multiset if <sup>C</sup> has property RAT with respect to F. (See [10] for more about the justification for this proof system.)

DRAT has a number of advantages over formats based on more traditional proof calculi, such as resolution or analytic tableaux. For SAT solvers, DRAT proofs are easier to emit because CNF clauses are the native data structures that the solvers store and manipulate internally. Whenever a solver obtains a new clause, the clause can be simply streamed out to a proof file without any further modification. Also, DRAT proofs are more compact than resolution proofs, as the latter can become infeasibly large for some classes of SAT problems [7].

There is, however, room for further improvement in the DRAT format due to the information loss incurred by DRAT proofs. Consider, for instance, the SAT problem and proofs shown in Figure 1. The left column is the input problem in the DIMACS format, the center column is its DRAT proof, and the right column is the equivalent proof in the LRAT format, which can be thought of as an enriched version of DRAT with more information. The numbers before the first zero on lines without a "d" represent literals: positive numbers denote positive literals, while negative numbers denote negative literals. The first clause of the input formula is (x<sup>1</sup> <sup>∨</sup> <sup>x</sup><sup>2</sup> <sup>∨</sup> <sup>x</sup>3), or equivalently 1 2 -3 <sup>0</sup> in DIMACS.

The first lines of both DRAT and LRAT proofs are RUP steps for adding the clause (x<sup>1</sup> <sup>∨</sup> <sup>x</sup>2), written 1 2 <sup>0</sup>. When an LRAT checker verifies this step, it is informed of the IDs of active clauses (the trailing numbers 163) relevant for unit propagation, in the exact order they should be used. Therefore, the LRAT checker only has to visit the first, sixth, and third clauses and confirm that, starting with unit literals <sup>x</sup>1, <sup>x</sup>2, they yield the new unit literals <sup>x</sup>3, x4, <sup>⊥</sup>. In contrast, a DRAT checker verifying the same step must add the literals x1, x<sup>2</sup> to the active multiset (in this case, the eight initial clauses) and carry out a blind unit propagation with the whole resulting multiset until contradiction. This omission of RUP information in DRAT proofs introduces significant overheads in proof verification. Although the exact figures vary from problem to problem, checking a DRAT proof typically takes approximately twice as long as solving the original problem, whereas the verification time for an LRAT proof is negligible compared to its solution time. This additional cost of checking DRAT proofs also represents a lost opportunity: when a SAT solver emits a RUP step, it knows exactly how the new clause was obtained, and this knowledge can (in theory) be turned into an LRAT-style RUP annotation, which can cut down verification costs significantly if conveyed to the verifier.

For the DRAT format, a design choice was made not to include such information since demanding explicit proofs for all steps turned out to be impractical. Although it is theoretically possible to always glean the correct RUP annotation from the solver state, computing this information can be intricate and costly for some types of inferences (e.g. conflict-clause minimization [22]), making it harder to support proof logging [25]. Reducing such overheads is particularly important for solving satisfiable formulas, as proofs are superfluous for them and the penalty for maintaining such proofs should be minimized. We should note, however, that proof elaboration need not be an all-or-nothing business; if it is infeasible to demand 100% elaborated proofs, we can still ask solvers to fill in as many gaps as it is convenient for them to do so, which would still be a considerable improvement over handling all of it from the verifier side.

Inclusion of final clauses is another potential area for improvement over the DRAT format. A DRAT proof typically includes many addition steps that do not ultimately contribute to the derivation of the empty clause. This is unavoidable in the proof emission phase, since a SAT solver cannot know in advance whether a given clause will be ultimately useful, and must stream out the clause before it can find out. All such steps, however, should be dropped in the postprocessing phase in order to compress proofs and speed up verification. The most straightforward way of doing this is processing the proof in reverse order [6]: when processing a clause Ck+1, identify all the clauses used to derive Ck+1, mark them as 'used', and move on to clause Ck. For each clause, process it if it is marked as used, and skip it otherwise. The only caveat of this method is that the postprocessor needs to know which clauses were present at the very end of the proof, since there is no way to identify which clauses were used to derive the


**Fig. 1.** DRAT and LRAT proofs of a SAT problem. All whitespace and alignment is not significant; we have aligned lines of the DRAT proof with the corresponding LRAT lines (d steps in LRAT may correspond to multiple DRAT d steps).

empty clause otherwise. Although it is possible to enumerate the final clauses by a preliminary forward pass through a DRAT proof, this is clearly unnecessary work since SAT solvers know exactly which clauses are present at the end, and it is desirable to put this information in the proof in the first place.

### **2 The FRAT format**

To address the above issues, we introduce FRAT, a new proof format designed to allow fine-grained communication between SAT solvers and elaborators. The main differences between FRAT and DRAT are:


We've already explained the rationale for (1) and (2); (3) is necessary for concise references to clauses in deletions and RUP step annotations. More specifically, a FRAT proof consists of the following six types of proof steps:


the proof refer to the clauses in the active set that contain the negated pivot literal, followed by the unit propagation proof of the resolvent. See [3] for more details on the LRAT checking algorithm.


(Our modified version of CaDiCaL also outputs a seventh kind of step, <sup>t</sup> todo id <sup>0</sup>, to collect statistics on code paths that produce <sup>a</sup> steps without proofs. See Section 3 for how this information is used.)

Figure 1 is an example from [3], which includes a SAT problem in DIMACS format, and the proofs of its unsatisfiability in DRAT and LRAT formats. It shows how proofs are produced and elaborated via the DRAT toolchain. Figure 2 shows the corresponding problem and proofs for the FRAT toolchain. Notice how the FRAT proof is more verbose than its DRAT counterpart and includes all the hints for addition steps, which are reused in the subsequent LRAT proof.

**Binary FRAT** The files shown in Figure 2 are in the text version of the FRAT format, but for efficiency reasons solvers may also wish to use a binary encoding. The binary FRAT format is exactly the same in structure, but the integers are encoded using the same variable-length integer encoding used in binary DRAT [9]. Unsigned numbers are encoded in 7-bit little endian, with the high bit set on each byte except the last. That is, the number

$$n = x\_0 + 2^7 x\_1 + \dots + 2^{7k} x\_k$$

(with each x<sup>i</sup> < 2<sup>7</sup>) is encoded as

$$\mathbf{1} x\_0 \; \mathbf{1} x\_1 \; \dots \; \mathbf{0} x\_k \; \mathbf{1}$$

Signed numbers are encoded by mapping <sup>n</sup> <sup>≥</sup> 0 to <sup>f</sup>(n) := 2<sup>n</sup> and <sup>−</sup><sup>n</sup> (with n > 0) to f(n) := 2n + 1, and then using the unsigned encoding. (Incidentally, the mapping f is not surjective, as it misses 1. But it is used by other formats so we have decided not to change it.)


**Fig. 2.** FRAT and LRAT proofs of a SAT problem. To illustrate that proofs are optional, we have omitted the proofs of steps 11 and 12 in this example. The steps must still be legal RAT steps but the elaborator will derive the proof rather than the solver.

#### **2.1 Flexibility and extensibility**

The purpose of the FRAT format is for solvers to be able to quickly write down what they are doing while they are doing it, with the elaborator stage "picking up the pieces" and preparing the proof for consumption by simpler mechanisms such as certified LRAT checkers. As such, it is important that we are able to concisely represent all manner of proof methods used by modern SAT solvers.

The high level syntax of a FRAT file is quite simple: A sequence of "segments", each of which begins with a character, followed by zero or more nonzero numbers, followed by a 0. In the binary version, each segment similarly begins with a printable character, followed by zero or more nonzero bytes, followed by a zero byte. (Note that continuation bytes in an unsigned number encoding are always nonzero.) This means that it is possible to jump into a FRAT file and find segment boundaries by searching for a nearby zero byte.

$$\begin{aligned} \langle \langle proj \rangle & \leftarrow \langle lin \rangle^{\*} \\ \langle line \rangle & \leftarrow \langle origin \rangle \mid \langle add \rangle \mid \langle del \rangle \mid \langle final \rangle \mid \langle final \rangle \mid \langle reloc \rangle \\ & \langle add \rangle & \leftarrow \langle add \, seg \rangle \mid \langle add \, seg \rangle \mid \langle hit \rangle \\ & \langle origin \rangle & \leftarrow \bullet \langle id \rangle \langle \langle literal \rangle^{\*} \, 0 \\ & \langle add \rangle & \leftarrow \mathbf{a} \, \langle id \rangle \, \langle literal \rangle^{\*} \, 0 \\ & \langle del \rangle & \leftarrow \mathbf{f} \, \langle id \rangle \, \langle literal \rangle^{\*} \, 0 \\ & \langle \langle real \rangle & \leftarrow \mathbf{r} \, \langle \langle id \rangle \, \langle id \rangle \rangle^{\*} \, 0 \\ & \langle hit \rangle & \leftarrow \mathbf{1} \, (\langle id \rangle \mid - \langle id \rangle)^{\*} \, 0 \\ & \langle id \rangle & \leftarrow \langle pos \rangle \\ & \langle \langle list \rangle & \leftarrow \langle pos \rangle \\ & \langle neg \rangle & \leftarrow -\langle pos \rangle \\ & \langle pos \rangle & \leftarrow \mathbf{[1-9]} \, \mathbf{[0-9]}^{\*} \end{aligned}$$

**Fig. 3.** Context-free grammar for the FRAT format.

text a 9 -3 -4 0 l 518 0 binary 61 09 07 09 00 6C 0A 02 10 00

**Fig. 4.** Comparison of binary and text formats for a step. Note that the step ID 9 uses the unsigned encoding, but literals and LRAT style proof steps use signed encoding.

This is in contrast to binary LRAT, in which add steps are encoded as <sup>a</sup> id literal<sup>∗</sup><sup>0</sup> (±id)<sup>∗</sup> <sup>0</sup>, because a random zero byte could either be the end of a segment or the middle of an add step. Since 0x61, the ASCII representation of a, is also a valid step ID (encoding the signed number −48), in a sequence such as (<sup>a</sup> nonzero<sup>∗</sup> <sup>0</sup>)∗, the literals and the steps cannot be locally disambiguated.

The local disambiguation property is important for our FRAT elaborator, because it means that we can efficiently parse FRAT files generated by solvers backward, reading the segments in reverse order so that we can perform backward checking in a single pass.

DRAT is based on adding clauses that are RAT with respect to the active formula. It is quite versatile and sufficient for most common cases, covering CDCL steps, hyper-resolution, unit propagation, blocked clause elimination and many other techniques. However, we recognize that not all methods can be cast into this format, or are too expensive to translate into this proof system. In this work we define only six segment characters (a, d, f, l, o, r), that suffice to cover methods used by SAT solvers targeting DRAT. However, the format is forward-compatible with new kinds of proof steps, that can be indicated with different characters.

For example, CryptoMiniSat [21] is a SAT solver that also supports XOR clause extraction and reasoning, and can derive new XOR clauses using proof techniques such as Gaussian elimination. Encoding this in DRAT is quite complicated: The XOR clauses must be Tseitin transformed into CNF, and Gaussian elimination requires a long resolution proof. Participants in SAT competitions therefore turn this reasoning method off as producing the DRAT proofs is either too difficult or the performance gains are canceled out by the overhead.

FRAT resolves this impasse by allowing the solver to express itself with minimal encoding overhead. A hypothetical extension to FRAT would add new segment characters to allow adding and deleting XOR clauses, and a new proof method for proof by linear algebra on these clauses. The FRAT elaborator would be extended to support the new step kinds, and it could either perform the expensive translation into DRAT at that stage (only doing the work when it is known to be needed for the final proof), or it could pass the new methods on to some XLRAT backend format that understands these steps natively. Since the extension is backward compatible, it can be done without impacting any other FRAT-producing solvers.

### **3 FRAT-producing solvers**

The FRAT proof format is designed to allow conversion of DRAT-producing solvers into FRAT-producing solvers at minimal cost, both in terms of implementation effort and impact on runtime efficiency. In order to show the feasibility of such conversions, we chose two popular SAT solvers, CaDiCaL<sup>1</sup> and MiniSat2, to modify as case studies. The solvers were chosen to demonstrate two different aspects of feasibility: since MiniSat forms the basis of the majority of modern SAT solvers, an implementation using MiniSat shows that the format is widely applicable, and provides code which developers can easily incorporate into a large number of existing solvers. CaDiCaL, on the other hand, is a cuttingedge modern solver which employs a wide range of sophisticated optimizations. A successful conversion of CaDiCaL shows that the technology is scalable, and is not limited to simpler toy examples.

As mentioned in Section 2, the main solver modifications required for FRAT production are inclusions of clause IDs, finalization steps, and LRAT proof traces. The provision of IDs requires some non-trivial modification as many solvers, including CaDiCaL and MiniSat, do not natively keep track of clause IDs, and DRAT proofs use literal lists up to permutation for clause identity. In CaDiCaL, we added IDs to all clauses, leading to 8 bytes overhead per clause. Additionally, unit clauses are tracked separately, and ensuring proper ID tracking for unit clauses resulted in some added code complexity. In MiniSat, we achieved 0 byte overhead by using the pointer value of clauses as their ID, with unit clauses having computed IDs based on the literal. This requires the use of relocation steps during garbage collection. The output of finalization steps requires identifying

<sup>1</sup> https://github.com/digama0/cadical

<sup>2</sup> https://github.com/digama0/minisat

the active set from the solver state, which can be subtle depending on the solver architecture, but is otherwise a trivial task assuming knowledge of the solver.

LRAT trace production is the heart of the work, and requires the solver to justify each addition step. This modification is relatively easier to apply to Mini-Sat, as it only adds clauses in a few places, and already tracks the "reasons" for each literal in the current assignment, which makes the proof trace straightforward. In contrast, CaDiCaL has over 30 ways to add clauses; in addition to the main CDCL loop, there are various in-processing and optimization passes that can create new clauses.

To accommodate this complexity, we leverage the flexibility of the FRAT format which allows optional hints to focus on the most common clause addition steps, to reap the majority of runtime advantage with only a few changes. The FRAT elaborator falls back on the standard elaboration-by-unit propagation when proofs are not provided, so future work can add more proofs to CaDiCaL without any changes to the toolchain.

To maximize the efficacy of the modification, we used a simple method to find places to add proofs. In the first pass, we added support for clause ID tracking and finalization, and changing the output format to FRAT syntax. Since CaDi-CaL was already producing DRAT proofs, we can easily identify the addition and removal steps and replace them with a and d steps. Once this is done, Ca-DiCaL is producing valid FRAT files which can pass through the elaborator and get LRAT results, but it will be quite slow since the FRAT elaborator is essentially acting as a less-optimized version of DRAT-trim at this point.

We then find all code paths that lead to an a step being emitted, and add an extra call to output a step of the form <sup>t</sup> todo id <sup>0</sup>, where todo id is some unique identifier of this position in the code. The FRAT elaborator is configured to ignore these steps, so they have no effect, but by running the solver on benchmarks we can count how many t steps of each kind appear, and so see which code paths are hottest.

The basic idea is that elaborating a step that has a proof is much faster than elaborating a step that doesn't, but the distribution of code paths leading to add steps is highly skewed, so adding proofs to to the top 3 or 4 paths already decreases the elaboration time by over 70%. At the time of writing, about one third of CaDiCaL code paths are covered, and median elaboration time is about 15% that of DRAT-trim (see Section 5). (This is despite the fact that our elaborator could stand to improve on low level optimizations, and runs about twice as slow as DRAT-trim when no proofs are provided.)

### **4 Elaboration**

The main tasks of the FRAT-to-LRAT elaborator<sup>3</sup> are provision of missing RUP step hints, elimination of irrelevant clause additions, and re-labeling clauses with new IDs. These tasks are performed in two separate 'passes' over files, writing

<sup>3</sup> The elaborator used for this paper can be found at https://github.com/digama0/ frat/tree/tacas.

**Algorithm 1** First pass (elaboration): FRAT to elaborated reversed FRAT


and reading directly to disk (so the entire proof is never in memory at once). In the first pass, the elaborator reads the FRAT file and produces a temporary file (which may be stored on disk or in memory depending on configuration). The temporary file is essentially the original FRAT file with the steps put in reverse order, while satisfying the following additional conditions:


Algorithm 1 shows the pseudocode of the first pass, Elaborate(cert). Here, cert is the FRAT proof obtained from the SAT solver, and the pass works by iterating over its steps in reverse order, producing the temporary file revcert. The map F maintains the active formula as a map with unique IDs for each clause (double inserts and removes to F are always error conditions), and the effect of each step is replayed backwards to reconstruct the solver's state at the point each step was produced.




In the second pass, Renumber(Forig, revcert) reads the input DIMACS file and the temporary file from the first pass, and produces the final result in LRAT format. Not much checking happens in this pass, but we ensure that the o steps in the FRAT file actually appear (up to permutation) in the input. The state that is maintained in this pass is a list of all active clause IDs, and the corresponding list of LRAT IDs (in which original steps are always numbered sequentially in the file, and add/delete steps use a monotonic counter that is incremented on each addition step).

The resulting LRAT file can then be verified by any of the verified LRAT checkers [26] (and our toolchain also includes a built-in LRAT checker for verification).

The 2-pass algorithm is used in order to optimize memory usage. The result of the first pass is streamed out so that the intermediate elaboration result does not have to be stored in memory simultaneously. Once the temporary file is streamed out, we need at least one more pass to reverse it (even if the labels did not need renumbering) since its steps are in reverse order.

### **5 Test results**

We performed benchmarks comparing our FRAT toolchain (modified CaDiCaL + FRAT-to-LRAT elaborator written in Rust) against the DRAT toolchain (standard CaDiCaL + DRAT-trim) and measured their execution times, output file sizes, and peak memory usages while solving SAT instances in the DIMACS format and producing their LRAT proofs. All tests were performed on Amazon EC2 r5a.xlarge instances, running Ubuntu Server 20.04 LTS on 2.5 GHz AMD EPYC 7000 processors with 32 GB RAM and 512 GB SSD.

The instances used in the benchmark were chosen by selecting all 97 instances for which default-mode CaDiCaL returned 'UNSAT' in the 2019 SAT Race results. One of these instances was excluded because DRAT-trim exhausted the available 32GB memory and failed during elaboration. Although this instance was not used for comparisons below, we note that it offers further evidence of the FRAT toolchain's efficient use of memory, since the FRAT-to-LRAT elaboration of this instance succeeded on the same system. The remaining 96 instances were used for performance comparison of the two toolchains. <sup>4</sup>

Figures 5 and 6 show the time and memory measurements from the benchmark. We can see from Figure 5 that the FRAT toolchain is significantly faster than DRAT toolchain. Although the modified CaDiCaL tends to be slightly (6%) slower than standard CaDiCaL, that overhead is more than compensated by a median 84% decrease in elaboration time (the sum over all instances are 1700.47 s in the DRAT toolchain vs. 381.70 s in the FRAT toolchain, so the average is down by 77%). If we include the time of the respective solvers, the FRAT + modified CaDiCaL toolchain takes 53.6% of the DRAT + CaDiCaL toolchain on median. The difference in the toolchains' time budgets is clear: the DRAT toolchain spends 42% of its time in solving and 58% in elaboration, while FRAT spends 85% on solving and only 15% on elaboration.

Figure 6 shows a dramatic difference in peak memory usage between the FRAT and DRAT toolchains. On median, the FRAT toolchain used only 5.4% as much peak memory as DRAT. (The average is 318.62 MB, which is 11.98% that of the DRAT toolchain's 2659.07 MB, but this is dominated by the really large instances. The maximum memory usage was 2.99 GB for FRAT and 21.5 GB for DRAT, but one instance exhausted the available 32 GB in DRAT and is not included in this figure.) This result is in agreement with our initial expectations: the FRAT toolchain's 2-pass elaboration method allows it to limit the number of clauses held in memory to the size of the active set used by the solver, whereas the DRAT toolchain loads all clauses in a DRAT file into memory at once during elaboration. This difference suggests that the FRAT toolchain can be used to verify instances that would otherwise require more memory than the system limit on the DRAT toolchain.

There were no noticeable differences in the sizes or verification times of LRAT proofs produced by the two toolchains. On average, LRAT proofs produced by

<sup>4</sup> A CSV of detailed benchmark results can be found at https://github.com/digama0/ frat/blob/tacas/benchmark/benchmark-results.csv.

**Fig. 5.** FRAT vs. DRAT time comparison. The datapoints of 'FRAT total' and 'DRAT total' show the number of instances that each toolchain could generate LRAT proofs for within the given time limit. The datapoints of 'FRAT elab' and 'DRAT elab' show the number of instances whose intermediate format proof files (FRAT or DRAT) could be elaborated to LRAT within the given time limit.

the FRAT toolchain were 1.873% smaller and 3.314% faster<sup>5</sup> to check than those from the DRAT toolchain.

One minor downside of the FRAT toolchain is that it requires the storage of a temporary file during elaboration, but we do not expect this to be a problem in practice since the temporary file is typically much smaller than either the FRAT or LRAT file. In our test cases, the average temporary file size was 28.68% and 47.60% that of FRAT and LRAT files, respectively. In addition, users can run the elaborator with the -m option to bypass temporary files and write the temporary data to memory instead, which further improves performance but foregoes the memory conservation that comes with 2-pass elaboration.

The CaDiCaL modification is only a prototype, and some of its weaknesses show in the data. The general pattern we observed is that on problems for which the predicted CaDiCaL code paths were taken, the generated files have a large number of hints and the elaboration time is negligible (the "FRAT elab" line in fig. 5); but on problems which make use of the more unusual in-processing operations, many steps with no hints are given to the elaborator, and performance becomes comparable to DRAT-trim. For solver developers, this means that there

<sup>5</sup> One instance was omitted from the LRAT verification time comparison due to what seems to be a bug in the standard LRAT checker included in DRAT-trim. Detailed information regarding this instance can be found at https://github.com/digama0/ frat/blob/tacas/benchmark/README.md.

**Fig. 6.** FRAT vs. DRAT peak memory usage comparison. Each datapoint shows the number of instances that each toolchain could successfully generate LRAT proofs for within the given peak memory usage limit.

is a very direct relationship between proof annotation effort and mean solution + elaboration time. Currently, elaboration of FRAT files with no annotations (the worst-case scenario for the FRAT toolchain) typically takes slightly more than twice as long as elaboration of DRAT files with DRAT-trim, likely due to missing optimizations from DRAT-trim that could be incorporated, but this only underscores the effectiveness of adding hints to the format.

### **6 Related works**

As already mentioned, the FRAT format is most closely related to the DRAT format [8], which it seeks to replace as an intermediate output format for SAT solvers. It is also dependent on the LRAT format and related tools [3], as the FRAT toolchain targets LRAT as the final output format.

The GRAT format [16] and toolchain also aims to improve elaboration of SAT unsatisfiability proofs, but takes a different approach from that of FRAT. It retains DRAT as the intermediate format, but uses parallel processing and targets a new final format with more information than LRAT in order to improve overall performance. GRAT also comes with its own verified checker [17].

Specifying and verifying the program correctness of SAT solvers (sometimes called the autarkic method, as opposed to the proof-producing skeptical method) is a radically different approach to ensuring the correctness of SAT solvers. There have been various efforts to verify nontrivial SAT solvers [18,20,19,4,5]. Although these solvers have become significantly faster, they cannot compete with the (unverified) state-of-the-art solvers. It is also difficult to maintain and modify certified solvers. Proving the correctness of nontrivial SAT solvers can provide new insights about key invariants underlying the used techniques [5].

Generally speaking, devising proof formats for automated reasoning tools and augmenting the tools with proof output capability is an active research area. Notable examples outside SAT solving include the LFSC format for SMT solving [23] and the TSTP format for classical first-order ATPs [24]. In particular, the recent work on the veriT SMT solver [1] is motivated by similar rationales as that for the FRAT toolchain; the key insight is that a proof production pipeline is often easier to optimize on the solver side than on the elaborator side, as the former has direct access to many types of useful information.

### **7 Conclusion**

The test results show that the FRAT format and toolchain made significant performance gains relative to their DRAT equivalents in both elaboration time and memory usage. We take this as confirmation of our initial conjectures that (1) there is a large amount of useful and easily extracted information in SAT solvers that is left untapped by DRAT proofs, and (2) the use of streaming verification is the key to verifying very large proofs that cannot be held in memory at once.

The practical ramification is that, provided that solvers produce well-annotated FRAT proofs, the elaborator is no longer a bottleneck in the pipeline. Typically, when DRAT-trim hangs it does so either by taking excessive time, or by attempting to read in an entire proof file at once and exhausting memory (the so-called "uncheckable" proofs that can be produced but not verified). But FRAT-to-LRAT elaboration is typically faster than FRAT production, and the memory consumption of the FRAT-to-LRAT elaborator at any given point is proportional to the memory used by the solver at the same point in the proof. Since LRAT verification is already efficient, the only remaining limiting factor is essentially the time and memory usage of the solver itself.

In addition to performance, the other main consideration in the design of the FRAT format and toolchain was flexibility of use and extension. The encoding of FRAT files allows them to be read and parsed both backward and forward, and the format can be modified to include more advanced inferences, as we have discussed in the example of XOR steps. The optional l steps allow SAT solvers to decide precisely when they will provide explicit proofs, thereby promoting a workable compromise between implementation complexity and runtime efficiency. SAT solver developers can begin using the format by producing the most bare-bones FRAT proofs with no annotations (essentially DRAT proofs with metadata for original/final clauses) and gradually work toward providing more complete hints. We hope that this combination of efficiency and flexibility will motivate performance-minded SAT solver developers to adopt the format and support more robust proof production, which is presently only an afterthought in most SAT solvers.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Generating Extended Resolution Proofs with a BDD-Based SAT Solver

Computer Science Department Carnegie Mellon University, Pittsburgh, PA, United States {Randy.Bryant, mheule}@cs.cmu.edu

Abstract. In 2006, Biere, Jussila, and Sinz made the key observation that the underlying logic behind algorithms for constructing Reduced, Ordered Binary Decision Diagrams (BDDs) can be encoded as steps in a proof in the *extended resolution* logical framework. Through this, a BDD-based Boolean satisfiability (SAT) solver can generate a checkable proof of unsatisfiability. Such proofs indicate that the formula is truly unsatisfiable without requiring the user to trust the BDD package or the SAT solver built on top of it.

We extend their work to enable arbitrary existential quantification of the formula variables, a critical capability for BDD-based SAT solvers. We demonstrate the utility of this approach by applying a prototype solver to obtain polynomially sized proofs on benchmarks for the mutilated chessboard and pigeonhole problems—ones that are very challenging for search-based SAT solvers.

Keywords: extended resolution, binary decision diagrams, mutilated chessboard, pigeonhole problem

### 1 Introduction

When a Boolean satisfiability (SAT) solver returns a purported solution to a Boolean formula, its validity can easily be checked by making sure that the solution indeed satisfies the formula. When the formula is unsatisfiable, on the other hand, having the solver simply declare this to be the case requires the user to have faith in the solver, a complex piece of software that could well be flawed. Indeed, modern solvers employ a number of sophisticated techniques to reduce the search space. If one of those techniques is invalid or incorrectly implemented, the solver may overlook actual solutions and label a formula as unsatisfiable, even when it is not.

With SAT solvers providing the foundation for a number of different real-world tasks, this "false negative" outcome could have unacceptable consequences. For example, when used as part of a formal verification system, the usual strategy is to encode some undesired property of the system as a formula. The SAT solver is then used to determine whether some operation of the system could lead to this undesirable property. Having the solver declare the formula to be unsatisfiable is an indication that the undesirable behavior cannot occur, but only if the formula is truly unsatisfiable.

Supported by the National Science Foundation under grant CCF-2010951

<sup>©</sup> The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 76–93, 2021. https://doi.org/10.1007/978-3-030-72016-2 5

Rather than requiring users to place their trust in a complex software system, a *proof-generating* solver constructs a proof that the formula is indeed unsatisfiable. The proof has a form that can readily be checked by a simple proof checker. Initial work of checking unsatisfiability results was based on resolution proofs, but modern checkers are based on stronger proof systems [16,33]. The checker provides an independent validation that the formula is indeed unsatisfiable. The checker can even be simple enough to be formally verified [9,23,29]. Such a capability has become an essential feature for modern SAT solvers.

In their 2006 papers [21,28], Jussila, Sinz and Biere made the key observation that the underlying logic behind algorithms for constructing Reduced, Ordered Binary Decision Diagrams (BDDs) [4] can be encoded as steps in a proof in the *extended resolution* logical framework [30]. Through this, a BDD-based Boolean satisfiability solver can generate checkable proofs of unsatisfiability for a set of clauses. Such proofs indicate that the formula is truly unsatisfiable without requiring the user to trust the BDD package or the SAT solver built on top of it.

In this paper, we refine these ideas to enable a full-featured, BDD-based SAT solver. Chief among these is the ability to perform existential quantification on arbitrary variables. (Jussila, Sinz, and Biere [21] extended their original work [28] to allow existential quantification, but only for the root variable of a BDD.) In addition, we allow greater flexibility in the choice of variable ordering and the order in which conjunction and quantification operations are performed. This combination allows a wide range of strategies for creating a sequence of BDD operations that, starting with a set of input clauses, yield the BDD representation of the constant function 0, indicating that the formula is unsatisfiable. Using the extended-resolution proof framework, these operations can generate a proof showing that the original set of clauses logically implies the empty clause, providing a checkable proof that the formula is unsatisfiable.

As the experimental results demonstrate, our refinements enable a proof-generating BDD-based SAT solver to achieve polynomial performance on several classic "hard" problems [1,15]. Since the performance of a proof-generating SAT solver affects not only the runtime of the program, but also the length of the proofs generated, achieving polynomial performance is an important step forward. Our results for these benchmarks rely on a novel approach to ordering the conjunction and quantification operations, inspired by symbolic model checking [7].

This paper is structured as follows. First, it provides a brief introduction to the resolution and extended resolution logical frameworks and to BDDs. Then we show how a BDD-based SAT solver can generate proofs by augmenting algorithms for computing the conjunction of two functions represented as BDDs, and for checking that one function logically implies another. We then describe our prototype implementation and evaluate its performance on several classic problems. We conclude with some general observations and suggestions for further work.

### 2 Preliminaries

Given a Boolean formula over a set of variables {x1, x2,...,xn}, a SAT solver attempts to find an assignment to these variables that will satisfy the formula, or it declares that the formula is unsatisfiable. As is standard practice, we use the term *literal* to refer to either a variable or its complement. Most SAT solvers use Boolean formulas expressed in *conjunctive normal form*, where the formula consists of a set of *clauses*, each consisting of a set of literals. Each clause is a disjunction: if an assignment sets any of its literals to true, the clause is considered to be satisfied. The overall formula is a conjunction: a satisfying assignment must satisfy all of the clauses.

We write to denote both tautology and logical truth, and ⊥ to represent both an empty clause and logical falsehood. When writing clauses, we omit disjunction symbols and use overlines to denote negation, writing <sup>u</sup> <sup>∨</sup> <sup>v</sup> <sup>∨</sup> <sup>w</sup> as u v <sup>w</sup>.

#### 2.1 (Extended) Resolution Proofs

Robinson [26] observed that a single inference rule could form the basis for a refutation theorem-proving technique for first-order logic. Here, we consider its specialization to propositional logic. For clauses of the form <sup>C</sup> <sup>∨</sup>x, and <sup>x</sup>∨D, the resolution rule derives the new clause <sup>C</sup> <sup>∨</sup> <sup>D</sup>. This inference is written with a notation showing the required conditions above a horizontal line, and the resulting inference (the *resolvent*) below:

$$\frac{C \lor x \quad \overline{x} \lor D}{C \lor D}$$

Resolution provides a mechanism for proving that a set of clauses is unsatisfiable. Suppose the input consists of m clauses. A resolution proof is given as a *trace* consisting of a series of *steps* S, where each step s<sup>i</sup> consists of a clause C<sup>i</sup> and a (possibly empty) list of antecedents Ai, where each antecedent is the index of one of the previous steps. The first set of steps, denoted Sm, consists of the input clauses without any antecedents. Each successive step then consists of a clause and a set of antecedents, such that the clause can be derived from the clauses in the antecedents by one or more resolution steps. It follows by transitivity that for each step si, with i>m, clause C<sup>i</sup> is logically implied by the input clauses, written <sup>S</sup><sup>m</sup> <sup>C</sup>i. If, through a series of steps, we can reach a step <sup>s</sup><sup>t</sup> where <sup>C</sup><sup>t</sup> is the empty clause, then the trace provides a proof that <sup>S</sup><sup>m</sup> <sup>⊥</sup>, i.e., the set of input clauses is not satisfiable.

Tseitin [30] introduced the extended-resolution proof framework in 1966. It allows the addition of new *extension* variables to a resolution proof in a manner that preserves the integrity of the proof. In particular, in introducing variable e, there must be an accompanying set of clauses that encode <sup>e</sup> <sup>↔</sup> <sup>F</sup>, where <sup>F</sup> is a formula over variables (both original and extension) that were introduced earlier. These are referred to as the *defining clauses* for extension variable e. Variable e then provides a shorthand notation by which F can be referenced multiple times. Doing so can reduce the size of a clausal representation of a problem by an exponential factor.

An extension variable e is introduced into the proof by including its defining clauses in the list of clauses being generated. The proof checker must ensure that these added clauses do not artificially restrict the set of satisfying solutions. The checker can do this by making sure that the defining clauses are *blocked* with respect to variable e [22]. That is, for each defining clause C containing literal e and each defining clause D containing literal e, there must be some literal l in C such that its complement l is in D. As a result, resolving clauses C and D will yield a tautology.

Tseitin transformations are commonly used to encode a logic circuit or formula as a set of clauses without requiring the formulas to be "flattened" into a conjunctive normal form over the circuit inputs or formula variables. These introduced variables are called *Tseitin variables* and are considered to be part of the input formula. An extended resolution proof takes this concept further by introducing additional variables as part of the proof. Some problems for which the minimum resolution proof must be of exponential size can be expressed with polynomial-sized proofs in extended resolution [8].

To validate the proofs, we use a clausal proof system, known as Resolution Asymmetric Tautology (RAT), that generalizes extended resolution [32]. RAT is used in industry and to validate the results of the SAT competitions [18]. There are various fast and formally-verified RAT proof checkers [10,23,29].

Clausal proofs also allow the removal of clauses. In our use, we delete clauses when the program can determine that they will not be referenced as antecedents for any succeeding clauses. As the experimental results of Section 4 demonstrate, deleting clauses that are no longer needed can substantially reduce the number of clauses the checker must track while processing a proof.

#### 2.2 Binary Decision Diagrams

Reduced, Ordered Binary Decision Diagrams (which we refer to as simply "BDDs") provide a canonical form for representing Boolean functions, and an associated set of algorithms for constructing them and testing their properties. A number of tutorials have been published [2,5,6]. providing a background on BDDs and their algorithms.

With BDDs, functions are defined over a set of variables <sup>X</sup> <sup>=</sup> {x1, x2,...,xn}. We let L<sup>1</sup> and L<sup>0</sup> denote the two leaf nodes, representing the constant functions 1 and 0, respectively. Each nonterminal node u has an associated variable Var(u) and children Hi(u), indicating the case where the node variable has value 1, and Lo(u), indicating the case where the node variable has value 0.

Nodes are stored in a *unique table*, indexed by the key Var(u), Hi(u), Lo(u), so that isomorphic nodes are never created. The nodes are shared across all of the generated BDDs [24]. In presenting algorithms, we assume a function GETNODE(x, u1, u0) that checks the unique table for a node with variable x and children u<sup>1</sup> and u0. It either returns the node stored there, or it creates a new node and enters it into the table. With this table, we can guarantee that the subgraphs with root nodes u and v represent the same Boolean function if and only if u = v. We can therefore identify Boolean functions with their BDD root nodes.

BDD packages support multiple operations for constructing and testing the properties of Boolean functions represented by BDDs. A number of these are based on the *Apply* algorithm [4]. Given BDDs u and v representing functions f and g, respectively, and a Boolean operation (e.g., AND), the algorithm generates the BDD representation <sup>w</sup> of the operation applied to those functions (e.g., <sup>f</sup> <sup>∧</sup> <sup>g</sup>.) For each operation, the program maintains an *operation cache* indexed by the argument nodes u and v, mapping to the result node w. With this cache, the worst case number of recursive steps required by the algorithm is bounded by the product of the sizes (in nodes) of the arguments.

We use the term APPLYAND to refer to the Apply algorithm for Boolean operation ∧ and APPLYOR to refer to the Apply algorithm for Boolean operation ∨.

### 3 Proof Generation During BDD Construction

In our formulation, every newly created BDD node u is assigned an extension variable. (As notation, we use the same name for the node and for its extension variable.) We then extend the Apply algorithm to generate proofs based on the recursive structure of the BDD operations.

Let <sup>S</sup><sup>m</sup> denote the set of input clauses. Our goal is to generate a proof that <sup>S</sup><sup>m</sup> ⊥, i.e., there is no satisfying assignment for these clauses. Our BDD-based approach generates a sequence of BDDs with root nodes u1, u2,...,ut, where u<sup>t</sup> = L0, based on a combination of the following operations. (The exact sequencing of operations is determined by the *evaluation mechanism*, as is described in Section 4.)


Although the existential quantification operation is not mandatory for a BDD-based SAT solver, it can greatly improve its performance [13]. It is the BDD counterpart to Davis-Putnam variable elimination on clauses [11]. As the notation indicates, there are often multiple variables that can be eliminated simultaneously. Although the operation can cause a BDD to increase in size, it generally causes a reduction. Our experimental results demonstrate the importance of this operation.

As these operations proceed, we simultaneously generate a set of proof steps. The details of each step are given later in the presentation. For each BDD generated, we maintain the proof invariant that its root node <sup>u</sup><sup>j</sup> satisfies <sup>S</sup><sup>m</sup> <sup>u</sup><sup>j</sup> .

	- (a) Using a modified version of the APPLYAND algorithm we follow the structure of its recursive calls to generate a proof that the algorithm preserves implication: <sup>u</sup><sup>j</sup> <sup>∧</sup> <sup>u</sup><sup>k</sup> <sup>→</sup> <sup>u</sup>l. This is described in Section 3.2.
	- (b) This implication can be combined with the earlier proofs that <sup>S</sup><sup>m</sup> <sup>u</sup><sup>j</sup> and <sup>S</sup><sup>m</sup> <sup>u</sup><sup>k</sup> to prove <sup>S</sup><sup>m</sup> <sup>u</sup>l.
	- (a) Following the generation of u<sup>k</sup> via existential quantification, we perform a separate check that <sup>u</sup><sup>j</sup> <sup>→</sup> <sup>u</sup>k. This check uses a proof-generating version of the Apply algorithm for implication testing that we refer to as PROVEIMPLICATION. This is described in Section 3.3.
	- (b) This implication can be combined with the earlier proof that <sup>S</sup><sup>m</sup> <sup>u</sup><sup>j</sup> to prove <sup>S</sup><sup>m</sup> <sup>u</sup>k.

As case 3(a) states, we do not attempt to track the detailed logic underlying the quantification operation. Instead, we run a separate check that the quantification preserves implication. As is the case with many BDD packages, our implementation can perform existential quantification of an arbitrary set of variables in a single pass over the argument BDD. A single implication test suffices for the entire quantification.

Sinz and Biere's formulation of proof generation by a BDD-based SAT solver [28] introduces special extension variables n<sup>1</sup> and n<sup>0</sup> to represent the BDD leaves L<sup>1</sup> and L0. Their proof then includes unit clauses n<sup>1</sup> and n<sup>0</sup> to force these variables to be set to 1 and 0, respectively. This formulation greatly reduces the number of special cases to consider in the proof-generating version of the APPLYAND operation, but it complicates the generation of resolution proofs for the implication test. Instead, we directly associate leaves <sup>L</sup><sup>1</sup> and <sup>L</sup><sup>0</sup> with and <sup>⊥</sup>, respectively.

The n variables in the input clauses all have associated BDD variables. The proof then introduces an extension variable every time a new BDD node is created. In the following presentation, we use the node name (e.g., u) to indicate the associated extension variable. In the actual implementation, the extension variable identifier (an integer) is stored as one of the fields in the node representation.

When creating a new node, the GETNODE function adds (up to) four defining clauses for the associated extension variable. For node u with variable Var(u) = x, Hi(u) = u1, and Lo(u) = u0, the clauses are:


The names for these clauses combine an indication of whether they correspond to variable x being 1 (H) or 0 (L) and whether they form an implication from the node down to its child (D) or from the child up to its parent (U). When either node u<sup>0</sup> or u<sup>1</sup> is a leaf node, some of these clauses degenerate to tautologies. Such clauses are omitted from the proof. Each clause is numbered according to its position in the sequence of clauses comprising the proof. These defining clauses encode the assertion <sup>u</sup> <sup>↔</sup> *ITE*(x, u1, u0), where *ITE* denotes the *if-then-else* operation, defined as *ITE*(x, y, z)=(x∧y)∨(x∧z). As can be seen, the defining clauses are blocked with respect to extension variable u.

#### 3.1 Generating BDD Representations of Clauses

The BDD representation u of a clause C is generated by using the APPLYOR operation on the BDD representations of its literals. This BDD has a simple, linear structure with one node for each literal. Each successive node has a branch to leaf node L<sup>1</sup> when the literal is true and to the next node in the chain when the literal is false. The proof that <sup>C</sup> <sup>u</sup> is based on this linear structure, employing the upward defining clauses HU and LU for the nodes in the chain [28].

#### 3.2 The APPLYAND Operation

The key idea in generating proofs for the AND operation is to follow the recursive structure of the Apply algorithm. We do this by integrating proof generation into the


Fig. 1. Terminal cases and recursive step of APPLYAND operation, modified for proof generation. Each call returns both a node and a proof step.

APPLYAND procedure. The overall control flow is identical to the standard version, except the function returns both a BDD node w and a step number s. For arguments u and v, the generated step s has clause u v w along with antecedents defining a resolution proof of the implication <sup>u</sup>∧<sup>v</sup> <sup>→</sup> <sup>w</sup>. We refer to this as the *justification* for the operation. The operation cache is modified to hold both the returned node and the justifying step number as values.

Figure 1 shows the main components of the implementation. When the two arguments are equal or one of the leaves is a terminal node, then the recursion terminates (left). These cases have tautologies as their justification. Failing a terminal case, the code checks in the operation cache for matching arguments u and v, returning the cached result if found.

Failing the terminal case tests and the cache lookup, the program proceeds as shown in the procedure APPLYANDRECUR (right). Here, the procedure branches on the variable x that is the minimum of the two root variables. The procedure accumulates a set of steps J to be used in the implication proof. These include the two steps (possibly tautologies) from the two recursive calls. At the end, it invokes a function JUSTIFYAND to generate the required proof. It stores both the result node w and the proof step s in the operation cache, and it provides these values as the return values.

Proof Generation for the General Case. Proving the nodes generated by APPLYAND satisfy the implication property proceeds by inducting on the structure of the argument

Fig. 2. Resolution proof for general step of the APPLYAND operation

and result BDDs. That is, it can assume that the results w<sup>1</sup> and w<sup>0</sup> of the recursive calls to arguments <sup>u</sup><sup>1</sup> and <sup>v</sup><sup>1</sup> and to <sup>u</sup><sup>0</sup> and <sup>v</sup><sup>0</sup> satisfy the implications <sup>u</sup><sup>1</sup> <sup>∧</sup> <sup>v</sup><sup>1</sup> <sup>→</sup> <sup>w</sup><sup>1</sup> and <sup>u</sup><sup>0</sup> <sup>∧</sup> <sup>v</sup><sup>0</sup> <sup>→</sup> <sup>w</sup>0, and that these calls generated proof steps <sup>s</sup><sup>1</sup> and <sup>s</sup><sup>0</sup> justifying these implications. Figure 2 shows the structure of the resolution proof for the general case, where none of the equalities hold and the recursive calls do not yield tautologies. The proof relies on the following clauses as antecedents, arising from the recursive calls and from the defining clauses for nodes u, v, and w:


Along the left, the clauses cover the case of x = 1, first resolving clause ANDH and WHU, then resolving the result first with clause UHD and then clause VHD. A similar progression occurs along the right covering the case of x = 0. The two chains are then merged by resolving on variable x to yield the final implication. As this figure illustrates, a total of seven resolution steps are required. These can be merged into two linear resolution chains, and so the proof generator produces at most two clauses per APPLYAND operation.

Proof Generation for Special Cases. The proof structure shown in Figure 2 only holds for the most general form of the recursion. However, there are many special cases, such as when the recursive calls yield tautologous results, when some of the child nodes are equal, and when the two recursive calls return the same node.

Our method for handling both the general and special cases relies on the V-shaped structure of the proofs, as is illustrated in Figure 2. That is, there are two linear chains, one along the left and one along the right consisting of some subsequence of the following clauses:

$$A\_H = \text{ANDH}, \text{WHU}, \text{UHD}, \text{VHD}$$

$$A\_L = \text{ANDL}, \text{WLU}, \text{ULD}, \text{VLD}$$

These will be proper subsequences when some of the clauses are not included in the set J in APPLYAND (Figure 1), or they are tautologies. In addition, some of the clauses may be extraneous and therefore must not occur as antecedents.

Rather than trying to enumerate the special cases, we found it better to create a general-purpose linear chain resolver that handles all of the cases in a uniform way. This resolver is called on the each of the clause sequences A<sup>H</sup> and AL. It proceeds through a sequence of clauses, discarding any tautologies and any clauses that do not resolve with the result so far. It then emits the proof clauses with the selected antecedents.

#### 3.3 Testing Implication


Fig. 3. Terminal cases and recursive step of PROVEIMPLICATION operation

When the existential quantification operation applied to node u generates node v, the program generates a proof that <sup>u</sup> <sup>→</sup> <sup>v</sup>, by calling procedure PROVEIMPLICATION with u and v as arguments. This procedure has the same recursive structure as the Apply algorithm, except that it does not generate any new nodes. It only returns the step number for a proof of the clause u v. It uses an operation cache, but only to hold proof step numbers. Figure 3 shows the terminal cases for this procedure, as well as the recursion that occurs when neither a terminal case applies nor are the arguments found in the operation cache. A failure of the implication test indicates an error in the solver, and so it signals a fatal error if the implication does not hold.

Each recursive step accumulates up to six proof steps as the set J to be used in the implication proof:


Fig. 4. Resolution proof for general step of the PROVEIMPLICATION operation

The resolution proof for the general case is shown in Figure 4. It has a similar structure to the proof for the APPLYAND operation, with two linear chains combined by a resolution on variable x. Our same general-purpose linear chain resolver can handle both the general case and the many special cases that arise.

### 4 Experimental Results

We implemented the proof-generating, SAT solver PGBDD (for Proof-Generating BDD). It is written entirely in Python and consists of around 2000 lines of code, including a BDD package, support for generating extended-resolution proofs, and the overall SAT solver framework.<sup>1</sup>

Although slow, it can handle large enough benchmarks to provide useful insights into the potential for a BDD-based SAT solver to generate proofs of challenging problems, especially when quantification is supported. It generates proofs in the LRAT format [9].

Our BDD package supports mark-and-sweep garbage collection. It starts the marking using the root nodes for all active terms in the sequence of root nodes u1, u2,.... Following the marking phase, it traverses the unique table and eliminates the unmarked nodes. It also traverses the operation caches and eliminates any entries for which one of the argument nodes or the result node is unmarked. When a node is deleted, the solver can also direct the proof checker to delete its defining clauses. Similarly, when an entry is deleted from the operation cache, the solver can direct the proof checker to delete those clauses added while generating the justification for the entry.

In addition to the input CNF file, the program can accept a variable-ordering file, mapping the input variables in the CNF to their levels in the BDD.

The solver supports three different evaluation mechanisms:


<sup>1</sup> The solver, along with code for generating and testing a set of benchmarks, is available at https://github.com/rebryant/pgbdd-artifact.

variable and place the result in the appropriate bucket [12]. This matches the operation described in [21].

Scheduled: Perform operations as specified by a scheduling file. This file contains a sequence of lines, each providing a command in a simple, stack-based notation:


In generating benchmarks, we wrote programs to generate the CNF files, the variable orderings, and the schedules in a unified framework.

For all of our benchmarks we report the total number of clauses in the proof, including the input clauses, the defining clauses for the extension variables (up to four per BDD node generated) and the derived clauses (one per input clause and up to two per result inserted into either *AndCache* or *ImplyCache*.)

We compare the performance of our BDD-based SAT solver with that of KISSAT, the winner of the 2020 SAT competition [3], representing the state of the art in searchbased SAT solvers.

#### 4.1 Mutilated Chessboard

The mutilated chessboard problem considers an <sup>n</sup> <sup>×</sup> <sup>n</sup> chessboard, with the corners on the upper left and the lower right removed. It attempts to tile the board with dominos, with each domino covering two squares. Since the two removed squares had the same color, and each domino covers one white and one black square, no tiling is possible. This problem has been well studied in the context of resolution proofs, for which it can be shown that any proof must be of exponential size [1].

A standard CNF encoding involves defining Boolean variables to represent the boundaries between adjacent squares, set to 1 when a domino spans the two squares, and set to 0 otherwise. The clauses then encode an Exactly1 constraint for each square, requiring each square to share a domino with exactly one of its neighbors. We label the variables representing a horizontal boundary between a square and the one below as <sup>y</sup>i,j , with <sup>1</sup> <sup>≤</sup> i<n and <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup>. The variables representing the vertical boundaries are labeled <sup>x</sup>i,j , with <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> and <sup>1</sup> <sup>≤</sup> j<n. With a mutilated chessboard, we have <sup>y</sup><sup>1</sup>,<sup>1</sup> <sup>=</sup> <sup>x</sup><sup>1</sup>,<sup>1</sup> <sup>=</sup> <sup>y</sup><sup>n</sup>−1,n <sup>=</sup> <sup>x</sup>n,n−<sup>1</sup> = 0.

As the log-log plot in Figure 5 shows, PGBDD has exponential performance when using linear conjunction or bucket elimination. Indeed, KISSAT outperforms PGBDD when operating in these modes. However, KISSAT can also be seen to have exponential performance—to reach n = 22, it generates a proof with over 136 million clauses.

On the other hand, another approach, inspired by symbolic model checking [7] yields polynomial performance. It is based on the following observation: when processing the columns from left to right, the only information required to place dominos in column j is the identity of those rows i for which a domino crosses horizontally from <sup>j</sup> <sup>−</sup> <sup>1</sup> to <sup>j</sup>. This information is encoded in the values of <sup>x</sup>i,j−<sup>1</sup> for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>.

Mutilated Chessboard Clauses

Fig. 5. Total number of clauses in proofs of n × n mutilated chess boards. The proofs using the column scanning approach grow as n<sup>2</sup>.<sup>69</sup>.

Let us group the variables into columns, with X<sup>j</sup> denoting variables x1,j ,...,xn,j , and <sup>Y</sup><sup>j</sup> denoting variables <sup>y</sup>1,j ,...,y<sup>n</sup>−1,j . Scanning the board from left to right, consider X<sup>j</sup> to encode the "state" of processing after completing column j. As the scanning process reaches column <sup>j</sup>, there is a *characteristic function* <sup>σ</sup><sup>j</sup>−1(X<sup>j</sup>−1) describing the set of allowed crossings of horizontally-oriented dominos from column <sup>j</sup> <sup>−</sup> <sup>1</sup> into column j. No other information about the configuration of the board to the left is required. The characteristic function after column j can then be computed as:

$$
\sigma\_j(X\_j) = \exists X\_{j-1} \left[ \sigma\_{j-1}(X\_{j-1}) \land \exists Y\_j \ T\_j(X\_{j-1}, Y\_j, X\_j) \right] \tag{1}
$$

where <sup>T</sup><sup>j</sup> (X<sup>j</sup>−<sup>1</sup>, Y<sup>j</sup> , X<sup>j</sup> ) is a "transition relation" consisting of the conjunction of the Exactly1 constraints for column j. From this, we can existentially quantify the variables <sup>Y</sup><sup>j</sup> to obtain a BDD encoding all compatible combinations of the variables <sup>X</sup><sup>j</sup>−<sup>1</sup> and <sup>X</sup><sup>j</sup> . By conjuncting this with the characteristic function for column <sup>j</sup> <sup>−</sup> <sup>1</sup> and existentially quantifying the variables <sup>X</sup><sup>j</sup>−<sup>1</sup>, we obtain the characteristic function for column j. With a mutilated chessboard, we generate leaf node L<sup>0</sup> in attempting the final conjunction. Note that Equation (1) does not represent a reformulation of the mutilated chessboard problem. It simply defines a way to schedule the conjunction and quantification operations over the input clauses and variables.

In our experiments, we found that this scanning reaches a fixed point after processing n/2 columns. That is, from that column onward, the characteristic functions become identical, except for a renaming of variables. This indicates that the set of all possible horizontal configurations stabilizes halfway across the board. Moreover, the BDD representations of the states grow as O(n<sup>2</sup>). For n = 124, the largest has just 3,969 nodes.

One important rule-of-thumb in symbolic model checking is that the successive values of the next-state variables must be adjacent in the variable ordering. Furthermore, the vertical variables in <sup>Y</sup><sup>j</sup> must be close to their counterparts in <sup>X</sup>j−<sup>1</sup> and <sup>X</sup><sup>j</sup> . Both objectives can be achieved by ordering the variables row-wise, interleaving the variables xi,j and yi,j , ordering first by row index i and then by column index j. This requires the quantification operations of Equation 1 to be performed on non-root variables.

Figure 5 shows that the "column-scanning" approach yields performance scaling as n2.69, allowing us to handle cases up to n = 124. Keep in mind that the problem size here should be measured as n2, the number of squares in the board. Thus, a problem instance with n = 124 is over 31 times larger than one with n = 22 (the upper limit reached by KISSAT), in terms of the number of input variables and clauses. Indeed, the case of n = 22 is straightforward for PGBDD, requiring only a few seconds and generating a proof with 161,694 clauses.2 By contrast, KISSAT requires 12.6 hours and generates over 136 million clauses.

The plot labeled "No Quantification" demonstrates the importance of including existential quantification in solving this problem. These data were generated by using the same schedule as with column scanning, but with all quantification operations omitted. As can be seen, this approach could not scale beyond n = 14.

Most attempts to generate propositional proofs of the mutilated chessboard have exponential performance. No solver in the 2018 SAT competition could handle the instance with n = 20. Heule, Kiesl, and Biere [19] devised a problem-specific approach that could generate proofs up to n = 50 by exploiting special symmetries in the problem, using a set of rewriting rules to dramatically reduce the search space. Our approach also exploits symmetries in the problem, but by exploiting a way to compactly encode the set of possible configurations between successive columns. Other than these two, we know of no other approach for generating polynomially-sized propositional proofs for the problem.

#### 4.2 Pigeonhole Problem

The pigeonhole problem is one of the most studied problems in propositional reasoning. Given a set of n holes and a set of n+1 pigeons, it asks whether there is an assignment of pigeons to holes such that 1) every pigeon is in some hole, and 2) every hole contains at most one pigeon. The answer is no, of course, but any resolution proof for this must be of exponential length [15]. Groote and Zantema have shown that any BDD-based proof of the principle that only uses the Apply algorithm must be of exponential size [14]. On the other hand, Cook constructed an extended resolution proof of size O(n<sup>4</sup>), in part to demonstrate the expressive power of extended resolution [8].

We consider two encodings of the problem. Both are based on a set of variables pi,j for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> and <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup> + 1, with the interpretation that pigeon <sup>j</sup> is assigned to hole i. Encoding the property that each pigeon j is assigned to some hole can be expressed as a single clause:

$$Pipeon\_j = \bigvee\_{i=1}^n p\_{i,j}$$

<sup>2</sup> All times reported here were measured on a 3 GHz Intel i7-9700 CPU with 16GB of memory.

Pigeonhole Clauses

Fig. 6. Total number of clauses in proofs of pigeonhole problem for n holes. Using a direct encoding led to exponential performance, but using a Tseitin encoding and column scanning gives proofs that grow as n<sup>3</sup>.<sup>03</sup>.

Encoding the property that each hole i contains at most one pigeon can be done in two different ways. A *direct* encoding simply states that for any pair of pigeons j and k, at least one of them must not be in hole i:

$$Direct\_i = \bigwedge\_{j=1}^{n+1} \bigwedge\_{k=j+1}^{n+1} \overline{p}\_{i,j} \vee \overline{p}\_{i,k}$$

This encoding requires Θ(n2) clauses for each hole, yielding a total CNF size of Θ(n3).

A second, *Tseitin* encoding introduces Tseitin variables to track which holes are occupied, starting with pigeon 1 and working upward. We use an encoding published by Sinz [27] that uses Tseitin variables <sup>s</sup>i,j for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> and <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup>, where <sup>s</sup>i,j equals 1 if a pigeon <sup>j</sup> occupies hole <sup>i</sup> for some <sup>j</sup> <sup>≤</sup> <sup>j</sup>. It requires <sup>3</sup><sup>n</sup> <sup>−</sup> <sup>1</sup> clauses and n Tseitin variables per hole, yielding an overall CNF size of Θ(n<sup>2</sup>).

As is illustrated by the log-log plots of Figure 6, this choice of encoding not only affects the CNF size, it dramatically affects the size of the proofs generated by PGBDD. With a direct encoding, we could not find any combination of evaluation strategy or variable ordering that could go beyond n = 16. Similarly, the Tseitin encoding did not help when using linear evaluation or bucket elimination. Indeed, we see KISSAT, using the Tseitin encoding, matching or exceeding our program for these cases, but all of these have exponential performance. (KISSAT could only reach n = 15 when using a direct encoding.)

On the other hand, the column scanning approach used for the mutilated checkerboard can also be applied to the pigeonhole problem when the Tseitin encoding is used. Consider an array with hole i represented by row i and pigeon j represented by column <sup>j</sup>. Let <sup>S</sup><sup>j</sup> represent the Tseitin variables <sup>s</sup>i,j for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. The "state" is then encoded in these Tseitin variables. In processing pigeon j, we can assume that the possible combinations of values of Tseitin variables <sup>S</sup>j−<sup>1</sup> is encoded by a characteristic function <sup>σ</sup>j−1(Sj−1). In addition, we incorporate into this characteristic function the requirement that each pigeon <sup>k</sup>, for <sup>1</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>j</sup> <sup>−</sup> <sup>1</sup> is assigned to some hole. Letting <sup>P</sup><sup>j</sup> denote the variables <sup>p</sup>i,j for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, the characteristic function at column <sup>j</sup> can then be expressed as:

$$\sigma\_j(S\_j) = \exists S\_{j-1} \left[ \sigma\_{j-1}(S\_{j-1}) \land \exists P\_j \ T\_j(S\_{j-1}, P\_j, S\_j) \right] \tag{2}$$

where the "transition relation" T<sup>j</sup> consists of the clauses associated with the Tseitin variables, plus the clause encoding constraint Pigeon<sup>j</sup> . As with the mutilated chessboard, having a proper variable ordering is critical to the success of a column scanning approach. We interleave the ordering of the variables pi,j and si,j , ordering them first by i (holes) and then by j (pigeons.)

Figure 6 demonstrates the effectiveness of the column-scanning approach. We were able to handle instances up to n = 150, and with an overall performance trend of n3.03. Our achieved performance therefore improves on Cook's bound of O(n4). A SATsolving method developed by Heule, Kiesl, Seidl, and Biere can generate short proofs of multiple encodings of pigeon hole formulas, including the direct encoding [20]. These proofs are similar to ours after transforming them into the same proof format and the size is also O(n3) [17].

Unlike with the mutilated chessboard, the scanning does not reach a fixed point. Instead, the BDDs start very small, because they must encode the locations of only a small number of occupied holes. They reach their maximum size at pigeon n/2, as the number of combinations for occupied and unoccupied holes reaches its maximum. Then the BDD sizes drop off as the encoding needs to track the positions of a decreasing number of unoccupied holes. Fortunately, all of these BDDs grow quadratically with n, reaching a maximum of 5,702 nodes for n = 150.

#### 4.3 Evaluation

Overall, our results demonstrate the potential for generating small proofs of unsatisfiability using BDDs. We have achieved polynomial performance for problems for which search-based SAT solvers have exponential performance.

Other studies have compared BDDs to search-based SAT on a variety of benchmark problems. Several of these observed exponential performance for BDD-based solvers for problems for which we have obtained polynomial performance. Uribe and Stickel [31] ran experiments with the mutilated chessboard problem, but they did not do any variable quantification. Pan and Vardi [25] applied a variety of scheduling and variable ordering strategies for the mutilated chessboard and pigeonhole problems. Although they were able to get better performance than with a search-based SAT solver, they still observed exponential scaling. Obtaining polynomial performance for these problems requires more problem-specific approaches than the ones they considered.

Table 1 provides some performance data for the largest instances solved for the two benchmark problems. A first observation is that these problems are very large, with tens of thousands of input variables and clauses.


Table 1. Summary data for the largest problems solved

The total number of BDD nodes indicates the total number generated by the function GETNODE, and for which extension variables are created. These are numbered in the millions, and far exceed the number of input variables. On the other hand, the maximum number of live nodes shows the effectiveness of garbage collection—at any given point in the program, at most 6% of the total number of nodes must be stored in the unique table and tracked in the operation caches. Garbage collection also keeps the number of clauses that must be tracked by the proof checker below 5% of the total number of clauses. The elapsed time for the SAT solver ranges up to 1.5 hours. We believe, however, that an implementation in a more performant language would reduce these times greatly. The checking times are shown for an LRAT proof checker written in the C programming language. The proofs have also been checked with a formally verified proof checker based on the HOL theorem prover [29].

### 5 Conclusion

Biere, Sinz, and Jussila [21,28] made the critical link between BDDs and extended resolution proofs. We have shown that adding the ability to perform arbitrary existential quantification can greatly increase the performance of a proof-generating, BDD-based SAT solver.

Generating proofs for the two benchmarks problems required special insights into their structure and then crafting evaluation mechanisms to exploit their properties. We believe, however, that the column scanning approach we employed could be generalized and made more automatic.

The ability to generate correctness proofs in a BDD-based SAT solver invites us to consider generating proofs for other tasks to which BDDs are applied, including QBF solving, model checking, and model counting. Perhaps a proof of unsatisfiability could provide a useful building block for constructing correctness proofs for these other tasks.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### Bounded Model Checking for Hyperproperties*-*

#### Tzu-Han Hsu<sup>1</sup> , César Sánchez<sup>2</sup> , and -Borzoo Bonakdarpour<sup>1</sup>

<sup>1</sup> Michigan State University, East Lansing, MI, USA, {tzuhan,borzoo}@msu.edu <sup>2</sup> IMDEA Software Institute, Madrid, Spain, cesar.sanchez@imdea.org

Abstract. This paper introduces a bounded model checking (BMC) algorithm for *hyperproperties* expressed in HyperLTL, which — to the best of our knowledge — is the first such algorithm. Just as the classic BMC technique for LTL primarily aims at finding bugs, our approach also targets identifying counterexamples. BMC for LTL is reduced to SAT solving, because LTL describes a property via inspecting individual traces. Our BMC approach naturally reduces to QBF solving, as Hyper-LTL allows explicit and simultaneous quantification over multiple traces. We report on successful and efficient model checking, implemented in our tool called HyperQube, of a rich set of experiments on a variety of case studies, including security, concurrent data structures, path planning for robots, and mutation testing.

### 1 Introduction

*Hyperproperties* [10] have been shown to be a powerful framework for specifying and reasoning about important classes of requirements that were not possible with trace-based languages such as the classic temporal logics. Examples include information-flow security, consistency models in concurrent computing [6], and robustness models in cyber-physical systems [5, 35]. The temporal logic Hyper-LTL [9] extends LTL by allowing explicit and simultaneous quantification over execution traces, describing the property of multiple traces. For example, the security policy *observational determinism* can be specified by the following HyperLTL formula: <sup>∀</sup>πA.∀πB.(o<sup>π</sup><sup>A</sup> <sup>↔</sup> <sup>o</sup><sup>π</sup><sup>B</sup> ) W ¬(i<sup>π</sup><sup>A</sup> <sup>↔</sup> <sup>i</sup><sup>π</sup><sup>B</sup> ) which stipulates that every pair of traces π<sup>A</sup> and π<sup>B</sup> have to agree on the value of the (public) output <sup>o</sup> as long as they agree on the value of the (secret) input <sup>i</sup>, where ' <sup>W</sup> ' denotes the weak until operator.

There has been a recent surge of model checking techniques for HyperLTL specifications [9, 12, 22, 24]. These approaches employ various techniques (e.g., alternating automata, model counting, strategy synthesis, etc) to verify hyperproperties. However, they generally fall short in proposing a general push-button method to deal with identifying bugs with respect to HyperLTL formulas involving quantifier alternation. Indeed, quantifier alternation has been shown to generally elevate the complexity class of model checking HyperLTL specifications in

This work was funded in part by the United States NSF SaTC Award 2100989, the Madrid Regional Government under project "S2018/TCS-4339 (BLOQUES-CM)", and by Spanish National Project "BOSCO (PGC2018-102210-B-100)".

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 94–112, 2021. https://doi.org/10.1007/978-3-030-72016-2\_6

different shapes of models [2, 9]. For example, consider the simple Kripke structure <sup>K</sup> in Fig. <sup>1</sup> and HyperLTL formulas <sup>ϕ</sup><sup>1</sup> <sup>=</sup> <sup>∀</sup>πA.∀πB. (pπ<sup>A</sup> <sup>↔</sup> <sup>p</sup>π<sup>B</sup> ) and <sup>ϕ</sup><sup>2</sup> <sup>=</sup> <sup>∀</sup>πA.∃πB. (pπ<sup>A</sup> <sup>↔</sup> <sup>p</sup>π<sup>B</sup> ). Proving that <sup>K</sup> |<sup>=</sup> <sup>ϕ</sup><sup>1</sup> (where traces for <sup>π</sup><sup>A</sup> and π<sup>B</sup> are taken from K) can be reduced to building the self-composition of K and applying standard LTL model checking, resulting in worst-case complexity |K| <sup>2</sup> in the size of the system. On the contrary, proving that <sup>K</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>2</sup> is not as straightforward. In the worst case, this requires a subset generation to encode the existential quantifier within the Kripke structure, resulting in <sup>|</sup>K| · <sup>2</sup>|K<sup>|</sup> blow up. In addition, the quantification is over traces rather than states, adding to the complexity of reasoning.

Following the great success of bounded model checking (BMC) for LTL specifications [8], in this paper, we propose a BMC algorithm for HyperLTL. To the best of our knowledge this is the first such algorithm. Just as BMC for LTL is reduced to SAT solving to search for a counterex-

Fig. 1: A Kripke structure.

ample trace whose length is bounded by some integer k, we reduce BMC for HyperLTL to QBF solving to be able to deal with quantified counterexample traces in the input model. More formally, given a HyperLTL formula, e.g., <sup>ϕ</sup> <sup>=</sup> <sup>∀</sup>πA.∃πB.ψ, and a family of Kripke structures <sup>K</sup> = (KA, KB) (one per trace variable), the reduction involves three main components. First, the transition relation of <sup>K</sup><sup>π</sup> (for every <sup>π</sup>) is represented by a Boolean encoding -<sup>K</sup><sup>π</sup>. Secondly, the inner LTL subformula <sup>ψ</sup> is translated to a Boolean representation <sup>ψ</sup> in a similar fashion to the BMC unrolling technique for LTL. This way, the QBF encoding for a bound <sup>k</sup> <sup>≥</sup> <sup>0</sup> roughly appears as:

$$\mathbb{I}\left[\mathbb{K}, \neg \varphi\right]\_k = \exists \overline{x\_A}. \forall \overline{x\_B}. \mathbb{I}[K\_A]\_k \land \left(\mathbb{I}[K\_B]\_k \to \left[\neg \psi\right]\_k\right) \tag{1}$$

where the vector of Boolean variables x<sup>A</sup> (respectively, xB) are used to represent the states and propositions of K<sup>A</sup> (resp. KB) for steps from 0 to k. Formulas -<sup>K</sup><sup>A</sup><sup>k</sup> and -<sup>K</sup><sup>B</sup><sup>k</sup> are the unrollings <sup>K</sup><sup>A</sup> (using <sup>x</sup>A) and <sup>K</sup><sup>B</sup> (using <sup>x</sup>B), and -¬ψ (that uses both <sup>x</sup><sup>A</sup> and <sup>x</sup>B) is the fixpoint Boolean encoding of <sup>¬</sup>ψ. The proposed technique in this paper does not incorporate a loop condition, as implementing such a condition for multiple traces is not straightforward. This, of course, comes at the cost of lack of a completeness result.

While our QBF encoding is a natural generalization of BMC for HyperLTL, the first contribution of this paper is a more refined view of how to interpret the behavior of the formula beyond the unrolling depth k. Consider LTL formula <sup>∀</sup>π. <sup>p</sup>π. BMC for LTL attempts to find a counterexample by unrolling the model and check for satisfiability of <sup>∃</sup>π. <sup>¬</sup>p<sup>π</sup> up-to bound <sup>k</sup>. Now consider LTL formula <sup>∀</sup>π. <sup>p</sup><sup>π</sup> whose negation is <sup>∃</sup>π. <sup>¬</sup>pπ. In the classic BMC, due to its *pessimistic* handling of , the unsatisfiability of the formula cannot be established in the finite unrolling (handling these formulas requires either a looping condition or to reach the diameter of the system). This is because <sup>¬</sup>p<sup>π</sup> is not *sometimes finitely satisfiable* (SFS), in the terminology introduced by Havelund

and Peled [27], meaning that not all satisfying traces of p<sup>π</sup> have a finite prefix that witness the satisfiability.

We propose a method that allows to interpret a wide range of outcomes of the QBF solver and relate these to the original model checking decision problem. To this end, we propose the following semantics for BMC for HyperLTL:


We have fully implemented our technique in the tool HyperQube. Our experimental evaluation includes a rich set of case studies, such as information-flow security, linearizability in concurrent data structures, path planning in robotic applications, and mutation testing. Our evaluation shows that our technique is effective and efficient in identifying bugs in several prominent examples. We also show that our QBF-based approach is certainly more efficient than a brute-force SAT-based approach, where universal and existential quantifiers are eliminated by combinatorial expansion to conjunctions and disjunctions. We also show that in some cases our approach can also be used as a tool for synthesis. Indeed, a witness to an existential quantifier in a HyperLTL formula is an execution path that satisfies the formula. For example, our experiments on path planning for robots showcase this feature of HyperQube.

In summary, the contributions of this paper are as follows. We (1) propose a QBF-based BMC approach for verification and falsification of HyperLTL specifications; (2) introduce complementary semantics that allow proving and disproving formulas, given a finite set of finite traces, and (3) rigorously analyze the performance of our technique by case studies from different areas of computing.

### 2 Preliminaries

#### 2.1 Kripke Structures

Let AP be a finite set of *atomic propositions* and Σ=2AP be the *alphabet*. A *letter* is an element of <sup>Σ</sup>. A *trace* <sup>t</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> over alphabet <sup>Σ</sup> is an infinite sequence of letters: <sup>t</sup> <sup>=</sup> <sup>t</sup>(0)t(1)t(2)···

Definition 1. *<sup>A</sup>* Kripke structure *is a tuple* <sup>K</sup> <sup>=</sup> S, Sinit, δ, L*, where*


Fig. <sup>1</sup> shows a Kripke structure, where <sup>S</sup>init <sup>=</sup> {s0}, <sup>L</sup>(s0) = {p}, <sup>L</sup>(s4) = {q, halt}, etc. The *size* of the Kripke structure is the number of its states. A *loop* in <sup>K</sup> is a finite sequence <sup>s</sup>(0)s(1)··· <sup>s</sup>(n), such that (s(i), s(<sup>i</sup> + 1)) <sup>∈</sup> <sup>δ</sup>, for all <sup>0</sup> <sup>≤</sup> i<n, and (s(n), s(0)) <sup>∈</sup> <sup>δ</sup>. We call a Kripke frame *acyclic*, if the only loops are self-loops on otherwise terminal states, i.e., on states that have no other outgoing transition. Since Definition 1 does not allow terminal states, we only consider acyclic Kripke structures with such added self-loops. We also label such states by atomic proposition halt.

<sup>A</sup> *path* of a Kripke structure is an infinite sequence of states <sup>s</sup>(0)s(1)···∈ <sup>S</sup><sup>ω</sup>, such that <sup>s</sup>(0) <sup>∈</sup> <sup>S</sup>init, and (s(i), s(<sup>i</sup> + 1)) <sup>∈</sup> <sup>δ</sup>, for all <sup>i</sup> <sup>≥</sup> <sup>0</sup>. A trace of a Kripke structure is a trace <sup>t</sup>(0)t(1)t(2)··· ∈ <sup>Σ</sup><sup>ω</sup>, such that there exists a path <sup>s</sup>(0)s(1)···∈ <sup>S</sup><sup>ω</sup> with <sup>t</sup>(i) = <sup>L</sup>(s(i)) for all <sup>i</sup> <sup>≥</sup> <sup>0</sup>. We denote by *Traces*(K, s) the set of all traces of <sup>K</sup> with paths that start in state <sup>s</sup> <sup>∈</sup> <sup>S</sup>, and use *Traces*(K) as a shorthand for <sup>s</sup>∈Sinit *Traces*(K, s).

#### 2.2 The Temporal Logic HyperLTL

*Syntax.* HyperLTL [9] is an extension of the linear-time temporal logic (LTL) for hyperproperties. The syntax of HyperLTL formulas is defined inductively by the following grammar:

$$\begin{aligned} \varphi &::= \exists \pi. \varphi \mid \forall \pi. \varphi \mid \phi \\ \phi &::= \text{true} \mid a\_{\pi} \mid \neg \phi \mid \phi \lor \phi \mid \phi \land \phi \mid \phi \mathcal{U} \,\phi \mid \phi \,\mathcal{R} \,\phi \mid \bigcirc \phi \end{aligned}$$

where <sup>a</sup> <sup>∈</sup> AP is an atomic proposition and <sup>π</sup> is a *trace variable* from an infinite supply of variables V. The Boolean connectives ¬, ∨, and ∧ have the usual meaning, U is the temporal *until* operator, R is the temporal *release* operator, and is the temporal *next* operator. We also consider other derived Boolean connectives, such as , and ↔, and the derived temporal operators *eventually* <sup>ϕ</sup> <sup>≡</sup> true <sup>U</sup> <sup>ϕ</sup> and *globally* <sup>ϕ</sup> ≡ ¬ <sup>¬</sup>ϕ. Even though the set of operators presented is not minimal, we have introduced this set to uniform the treatment with the variants in Section 3. The quantified formulas <sup>∃</sup><sup>π</sup> and <sup>∀</sup><sup>π</sup> are read as "along some trace π" and "along all traces π", respectively. A formula is *closed* (i.e., a *sentence*) if all trace variables used in the formula are quantified. We assume, without loss of generality, that no variable is quantified twice. We use *Vars*(ϕ) for the set of path variables used in formula ϕ.

*Semantics.* An interpretation <sup>T</sup> <sup>=</sup> Tπ<sup>π</sup>∈Vars(ϕ) of a formula <sup>ϕ</sup> consists of a tuple of sets of traces, with one set T<sup>π</sup> per trace variable π in *Vars*(ϕ), denoting the set of traces assigned to π. Note that we allow quantifiers to range over different models. We will use this feature in the verification of hyperproperties such as linearizability, where different quantifiers are associated with different sets of executions (in this case one for the concurrent implementation and one for the sequential implementation). That is, each set of traces comes from a Kripke structure and we use <sup>K</sup> <sup>=</sup> Kπ<sup>π</sup>∈Vars(ϕ) to denote a *family* of Kripke structures, so T<sup>π</sup> = *Traces*(Kπ) is the traces that π can range over, which comes from <sup>K</sup>π. Abusing notation, we write <sup>T</sup> <sup>=</sup> *Traces*(K). Note that picking a single K and letting K<sup>π</sup> = K for all π is a particular case, which leads to the original semantics of HyperLTL [9].

Our semantics of HyperLTL is defined with respect to a trace assignment, which is a partial map Π : *Vars*(ϕ) Σ<sup>ω</sup>. The assignment with the empty domain is denoted by <sup>Π</sup>∅. Given a trace assignment <sup>Π</sup>, a trace variable <sup>π</sup>, and a concrete trace <sup>t</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>, we denote by <sup>Π</sup>[<sup>π</sup> <sup>t</sup>] the assignment that coincides with Π everywhere but at π, which is mapped to trace t. The satisfaction of a HyperLTL formula <sup>ϕ</sup> is a binary relation <sup>|</sup><sup>=</sup> that associates a formula to the models (<sup>T</sup> , Π, i) where <sup>i</sup> <sup>∈</sup> <sup>Z</sup>≥<sup>0</sup> is a pointer that indicates the current evaluating position. The semantics is defined as follows:

(<sup>T</sup> , Π, 0) <sup>|</sup><sup>=</sup> <sup>∃</sup>π. ψ iff there is a <sup>t</sup> <sup>∈</sup> <sup>T</sup>π, such that (<sup>T</sup> , Π[<sup>π</sup> <sup>t</sup>], 0) <sup>|</sup><sup>=</sup> ψ, (<sup>T</sup> , Π, 0) <sup>|</sup><sup>=</sup> <sup>∀</sup>π. ψ iff for all <sup>t</sup> <sup>∈</sup> <sup>T</sup>π, such that (<sup>T</sup> , Π[<sup>π</sup> <sup>t</sup>], 0) <sup>|</sup><sup>=</sup> ψ, (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> true (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>a</sup><sup>π</sup> iff <sup>a</sup> <sup>∈</sup> <sup>Π</sup>(π)(i), (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>¬</sup><sup>ψ</sup> iff (<sup>T</sup> , Π, i) |<sup>=</sup> ψ, (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>1</sup> <sup>∨</sup> <sup>ψ</sup><sup>2</sup> iff (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>1</sup> or (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>ψ</sup>2, (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>1</sup> <sup>∧</sup> <sup>ψ</sup><sup>2</sup> iff (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>1</sup> and (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>ψ</sup>2, (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>ψ</sup> iff (<sup>T</sup> , Π, i + 1) <sup>|</sup><sup>=</sup> ψ, (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>1</sup> <sup>U</sup> <sup>ψ</sup><sup>2</sup> iff there is a <sup>j</sup> <sup>≥</sup> <sup>i</sup> for which (<sup>T</sup> , Π, j) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>2</sup> and for all <sup>k</sup> <sup>∈</sup> [i, j),(<sup>T</sup> , Π, k) <sup>|</sup><sup>=</sup> <sup>ψ</sup>1, (<sup>T</sup> , Π, i) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>1</sup> <sup>R</sup> <sup>ψ</sup><sup>2</sup> iff either for all <sup>j</sup> <sup>≥</sup> i, (<sup>T</sup> , Π, j) <sup>|</sup><sup>=</sup> <sup>ψ</sup>2, or, for some <sup>j</sup> <sup>≥</sup> i,(<sup>T</sup> , Π, j) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>1</sup> and for all <sup>k</sup> <sup>∈</sup> [i, j]:(<sup>T</sup> , Π, k) <sup>|</sup><sup>=</sup> <sup>ψ</sup>2.

This semantics is slightly different from the definition in [9], but equivalent (see [30]). We say that an interpretation <sup>T</sup> satisfies a sentence <sup>ϕ</sup>, denoted by T |<sup>=</sup> <sup>ϕ</sup>, if (<sup>T</sup> , Π∅, 0) <sup>|</sup><sup>=</sup> <sup>ϕ</sup>. We say that a family of Kripke structures <sup>K</sup> satisfies a sentence <sup>ϕ</sup>, denoted by K |<sup>=</sup> <sup>ϕ</sup>, if *Traces*(Kπ)<sup>π</sup>∈Vars(ϕ) <sup>|</sup><sup>=</sup> <sup>ϕ</sup>. When the same Kripke structure K is used for all path variables we write <sup>K</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>. For example, the Kripke structure in Fig. <sup>1</sup> satisfies HyperLTL formula <sup>ϕ</sup> <sup>=</sup> <sup>∀</sup>πA.∃πB. (p<sup>π</sup><sup>A</sup> <sup>↔</sup> <sup>p</sup><sup>π</sup><sup>B</sup> ).

### 3 Bounded Semantics for HyperLTL

We introduce now the bounded semantics of HyperLTL, used in Section 4 to generate queries to a QBF solver to aid solving the model checking problem.

### 3.1 Bounded Semantics

We assume the HyperLTL formula is closed and of the form <sup>Q</sup>AπA.QBπ<sup>B</sup> ... <sup>Q</sup>ZπZ.ψ, where <sup>Q</sup> ∈ {∀, ∃} and it has been converted into negation-normal form (NNF) so that the negation symbol only appears in front of atomic propositions, e.g., <sup>¬</sup>a<sup>π</sup><sup>A</sup> . Without loss of generality and for the sake of clarity from other numerical indices, we use roman alphabet as indices of trace variables. Thus, we assume that *Vars*(ϕ) ⊆ {πA, πB,...,πZ}. The main idea of BMC is to perform incremental exploration of the state space of the systems by unrolling the systems and the formula up-to a bound. Let <sup>k</sup> <sup>≥</sup> <sup>0</sup> be the unrolling *bound* and let <sup>T</sup> <sup>=</sup> T<sup>A</sup> ...TZ be a tuple of sets of traces, one per trace variable. We start by defining a satisfaction relation between HyperLTL formulas for a bounded exploration <sup>k</sup> and models (<sup>T</sup> , Π, i), where <sup>T</sup> is the tuple of set of traces, <sup>Π</sup> is a trace assignment mapping (as defined in Section 2), and <sup>i</sup> <sup>∈</sup> <sup>Z</sup>≥<sup>0</sup> that points to the position of traces. We will define different finite satisfaction relations for general models (for <sup>∗</sup> <sup>=</sup> *pes*, *opt*, *hpes*, *hopt*):


All these semantics coincide in the interpretation of quantifiers, Boolean connectives, and temporal operators up-to instant <sup>k</sup>−1, but differ in their assumptions about unseen future events after the bound of observation k.

Quantifiers. The satisfaction relation for the quantifiers is the following:

$$\{\langle \mathcal{T}, H, 0 \rangle \models\_k^\* \exists \pi. \,\,\psi \quad \text{iff} \quad \text{there is a } t \in T\_\pi : \{\!\langle \!\langle \!\Box \!\langle \!\Box \!\vdash t \vert \!\Box \!\rangle \, ! \,\vert \,\!\vdash \_k^\* \psi\} \,. \tag{1}$$

$$(\mathcal{T}, \Pi, 0) \doteq\_k^\* \forall \pi. \; \psi \quad \text{iff} \quad \text{for all} \qquad t \in T\_\pi : (\mathcal{T}, \Pi[\pi \to t], 0) \mid =\_k^\* \psi. \tag{2}$$

Boolean operators. For every <sup>i</sup> <sup>≤</sup> <sup>k</sup>, we have:

$$\{\left(\begin{array}{c}\Box,H,i\right)|\doteq[k]{}\mbox{true},\\\qquad\ldots\qquad\ldots\qquad\ldots\qquad\ldots\end{array}\}\tag{3}$$

(<sup>T</sup> , Π, i) <sup>|</sup>=<sup>∗</sup> <sup>k</sup> <sup>a</sup><sup>π</sup> iff <sup>a</sup> <sup>∈</sup> <sup>Π</sup>(π)(i), (4)

(<sup>T</sup> , Π, i) <sup>|</sup>=<sup>∗</sup> <sup>k</sup> <sup>¬</sup>a<sup>π</sup> iff <sup>a</sup> ∈ <sup>Π</sup>(π)(i), (5)

$$\left(\mathcal{T}, \varPi, i\right) \vdash\_{k}^{\shortlor} \psi\_{1} \lor \psi\_{2} \quad \text{iff} \quad \left(\mathcal{T}, \varPi, i\right) \vdash\_{k}^{\shortlor} \psi\_{1} \text{ or } \left(\mathcal{T}, \varPi, i\right) \vdash\_{k}^{\shortlor} \psi\_{2},\tag{6}$$

$$(\mathcal{T}, \Pi, i) \stackrel{\dashv}{=}\_k^\* \psi\_1 \wedge \psi\_2 \quad \text{iff} \quad (\mathcal{T}, \Pi, i) \stackrel{\dashv}{=}\_k^\* \psi\_1 \text{ and } (\mathcal{T}, \Pi, i) \stackrel{\dashv}{=}\_k^\* \psi\_2. \tag{7}$$

Temporal connectives. The case where (i<k) is common between the optimistic and pessimistic semantics:

$$\begin{array}{llll} (\mathcal{T}, \Pi, i) \vdash\_k^\* \mathsf{O} \psi & \text{iff} & (\mathcal{T}, \Pi, i+1) \vdash\_k^\* \psi, \\ (\mathcal{T}, \Pi, i) \vdash\_k^\* \psi\_1 \mathcal{U} \, \psi\_2 & \text{iff} & (\mathcal{T}, \underline{\Pi}, i) \vdash\_k^\* \psi\_2, \text{ or } \\ & & & \\ \end{array} \tag{8}$$

$$(\mathcal{T}, \Pi, i) \vdash\_k^\* \psi\_1 \mathcal{R} \,\psi\_2 \quad \text{iff} \quad (\mathcal{T}, \Pi, i) \vdash\_k^\* \psi\_1 \,\text{and} \quad (\mathcal{T}, \Pi, i+1) \vdash\_k^\* \psi\_1 \mathcal{U} \,\psi\_2,\tag{9}$$
 
$$(\mathcal{T}, \Pi, i) \vdash\_k^\* \psi\_1 \,\text{or} \,\,(\mathcal{T}, \Pi, i+1) \vdash\_k^\* \psi\_1 \,\text{or} \,(\mathcal{T}, \Pi, i+1) \vdash\_k^\* \psi\_1 \,\mathcal{R} \,\psi\_2.\tag{10}$$

For (i = k), in the pessimistic semantics the eventualities (including ) are assumed to never be fulfilled in the future, so the current instant k is the last chance:

$$\left(\underbrace{\mathcal{T}, \Pi, i}\_{\bullet}\right) \vdash\_{\text{res}}^{\text{pes}} \mathbb{O}\psi \qquad \text{iff} \quad \text{never happens}, \tag{P\_1}$$

$$\left(\begin{matrix}\mathcal{T},\Pi,i\end{matrix}\right) \stackrel{\textstyle \equiv}{=}^{\mathcal{P}es}\_{\text{res}} \psi\_1 \mathcal{U} \psi\_2 \quad \text{iff} \quad \left(\begin{matrix}\mathcal{T},\Pi,i\end{matrix}\right) \stackrel{\textstyle \equiv}{=}^{\mathcal{P}es}\_{\text{res}} \psi\_2,\tag{2}$$

$$\left| \left( \mathcal{T}, \Pi, i \right) \right| \doteq\_{k}^{\mathsf{p}es} \psi\_{1} \mathcal{R} \,\psi\_{2} \quad \text{iff} \quad \left( \mathcal{T}, \Pi, i \right) \left| \models\_{k}^{\mathsf{p}es} \psi\_{1} \wedge \psi\_{2} . \tag{P\_{3}}$$

On the other hand, in the optimistic semantics the eventualities are assumed to be fulfilled in the future:

$$\{\langle \mathcal{T}, \Pi, i \rangle \models\_{k}^{opt} \mathsf{O} \psi \qquad \text{iff} \quad \text{always happens}, \tag{O\_1}$$

$$\left| \left( \mathcal{T}, \Pi, i \right) \right| \succeq\_{k}^{opt} \psi\_{1} \mathcal{U} \psi\_{2} \quad \text{iff} \quad \left( \mathcal{T}, \Pi, i \right) \left| \models\_{k}^{opt} \psi\_{1} \vee \psi\_{2}, \tag{O\_{2}}$$

$$(\mathcal{T}, \Pi, i) \left| \vdash\_k^{opt} \psi\_1 \, \mathcal{R} \, \psi\_2 \quad \text{iff} \quad (\mathcal{T}, \Pi, i) \left| \vdash\_k^{opt} \psi\_2. \right. \tag{O\_3}$$

To capture the halting semantics, we use the predicate *halt* that is true if the state corresponds to a halting state (self-loop), and define *halted* def = - <sup>π</sup>Vars(ϕ) *halt*<sup>π</sup> which holds whenever all traces have halted (and their final state will be repeated ad infinitum). Then, the halted semantics of the temporal case for i = k in the pessimistic case consider the halting case to infer the actual value of the temporal operators on the (now fully known) trace:

$$\begin{array}{lll} (\mathcal{T},\Pi,i) \vdash\_{k}^{hps} \mathsf{O}\psi & \mbox{iff} & (\mathcal{T},\Pi,i) \vdash\_{k}^{\*} \mathit{halted} \text{ and } (\mathcal{T},\Pi,i) \vdash\_{k}^{hps} \psi & (\mathcal{H}P\_{1})\\ (\mathcal{T},\Pi,i) \vdash\_{k}^{hps} \psi\_{1} \mathcal{U}\psi\_{2} & \mbox{iff} & (\mathcal{T},\Pi,i) \vdash\_{k}^{hps} \psi\_{2} & (\mathcal{H}P\_{2})\\ (\mathcal{T},\Pi,i) \vdash\_{k}^{hps} \psi\_{1} \mathcal{R}\ \psi\_{2} & \mbox{iff} & (\mathcal{T},\Pi,i) \vdash\_{k}^{hps} \psi\_{1} \wedge \psi\_{2}, \mbox{ or } \\ & & (\mathcal{T},\Pi,i) \vdash\_{k}^{\*} \mathit{halted} \text{ and } (\mathcal{T},\Pi,i) \vdash\_{k}^{hps} \psi\_{2} \ (\mathcal{H}P\_{3}) \end{array}$$

Dually, in the halting optimistic case:

$$\begin{array}{lll} (\mathcal{T}, \Pi, i) \vdash\_{k}^{hopt} \mathsf{Q} \psi & \text{iff} & (\mathcal{T}, \Pi, i) \not\models\_{k}^{\*} \mathit{halted} \text{ or } (\mathcal{T}, \Pi, i) \vdash\_{k}^{hopt} \psi & (HO\_{1})\\ (\mathcal{T}, \Pi, i) \vdash\_{k}^{hopt} \psi\_{1} \mathcal{U} \psi\_{2} & \text{iff} & (\mathcal{T}, \Pi, i) \vdash\_{k}^{hopt} \psi\_{2} \text{ or } &\\ & & (\mathcal{T}, \Pi, i) \not\models\_{k}^{\*} \mathit{halted} \text{ and } (\mathcal{T}, \Pi, i) \vdash\_{k}^{hopt} \psi\_{1} \ (HO\_{2})\\ (\mathcal{T}, \Pi, i) \vdash\_{k}^{hopt} \psi\_{1} \mathcal{R} \psi\_{2} & \text{iff} & (\mathcal{T}, \Pi, i) \vdash\_{k}^{hess} \psi\_{2} & (HO\_{3}) \end{array}$$

Complete semantics. We are now ready to define the four semantics:

− Pessimistic semantics: |=pes <sup>k</sup> use rules (1)-(10) and (P1)-(P3). − Optimistic semantics: |=opt <sup>k</sup> use rules (1)-(10) and (O1)-(O3). − Halting pessimistic semantics: |=hpes <sup>k</sup> use rules (1)-(10) and (*HP*1)-(*HP*3). − Halting optimistic semantics: |=hopt <sup>k</sup> use rules (1)-(10) and (*HO*1)-(*HO*3).

#### 3.2 The Logical Relation between Different Semantics

Observe that the pessimistic semantics is the semantics in the traditional BMC for LTL.In the pessimistic semantics a formula is declared false unless it is witnessed to be true within the bound explored. In other words, formulas can only get "truer" with more information obtained by a longer unrolling. Dually, the optimistic semantics considers a formula true unless there is evidence within the bounded exploration on the contrary. Therefore, formulas only get "falser" with further unrolling. For example, formula p always evaluates to false in the pessimistic semantics. In the optimistic semantics, it evaluates to true up-to bound k if p holds in all states of the trace up-to and including k. However, if the formula evaluates to false at some point before k, then it evaluates to false for all <sup>j</sup> <sup>≥</sup> <sup>k</sup>. The following lemma formalizes this intuition in HyperLTL.

Lemma 1. *Let* <sup>k</sup> <sup>≤</sup> <sup>j</sup>*. Then, 1. If* (<sup>T</sup> , Π, 0) <sup>|</sup>=pes <sup>k</sup> <sup>ϕ</sup>*, then* (<sup>T</sup> , Π, 0) <sup>|</sup>=pes <sup>j</sup> <sup>ϕ</sup>*. 2. If* (<sup>T</sup> , Π, 0) |=opt <sup>k</sup> <sup>ϕ</sup>*, then* (<sup>T</sup> , Π, 0) |=opt <sup>j</sup> <sup>ϕ</sup>*. 3. If* (<sup>T</sup> , Π, 0) <sup>|</sup>=hpes <sup>k</sup> <sup>ϕ</sup>*, then* (<sup>T</sup> , Π, 0) <sup>|</sup>=hpes <sup>j</sup> <sup>ϕ</sup>*. 4. If* (<sup>T</sup> , Π, 0) |=hopt <sup>k</sup> <sup>ϕ</sup>*, then* (<sup>T</sup> , Π, 0) |=hopt <sup>j</sup> <sup>ϕ</sup>*.*

In turn, the verdict obtained from the exploration up-to k can (in some cases) be used to infer the verdict of the model checking problem. As in classical BMC, if the pessimistic semantics find a model, then it is indeed a model. Dually, if our optimistic semantics fail to find a model, then there is no model. The next lemma formally captures this intuition.

Lemma 2 (Infinite inference). *The following hold for every* k*,*


*Example 1.* Consider the Kripke structure in Fig. 1, bound k = 3, and formula <sup>ϕ</sup><sup>1</sup> <sup>=</sup> <sup>∀</sup>πA.∃πB. (p<sup>π</sup><sup>A</sup> <sup>↔</sup> <sup>p</sup><sup>π</sup><sup>B</sup> ) R ¬q<sup>π</sup><sup>A</sup> . It is easy to see that instantiating π<sup>A</sup> with trace s0s1s2s<sup>4</sup> falsifies ϕ<sup>1</sup> in the pessimistic semantics. By Lemma 2, this counterexample shows that the Kripke structure is a model of <sup>¬</sup>ϕ<sup>1</sup> in the infinite semantics as well. That is, <sup>K</sup> <sup>|</sup>=pes <sup>3</sup> <sup>¬</sup>ϕ<sup>1</sup> and, hence, <sup>K</sup> <sup>|</sup><sup>=</sup> <sup>¬</sup>ϕ1, so <sup>K</sup> |<sup>=</sup> <sup>ϕ</sup>1.

Consider again the same Kripke structure, bound k = 3, and formula ϕ<sup>2</sup> = <sup>∀</sup>πA.∃πB. (p<sup>π</sup><sup>A</sup> <sup>↔</sup> <sup>q</sup><sup>π</sup><sup>B</sup> ). To disprove <sup>ϕ</sup>2, we need to find a trace <sup>π</sup><sup>A</sup> such that for all other πB, proposition q in π<sup>B</sup> always disagrees with p in πA. It is straightforward to observe that such a trace π<sup>A</sup> does not exist. By Lemma 2, proving the formula is not satisfiable up-to bound 3 in the optimistic semantics implies that <sup>K</sup> is not a model of <sup>¬</sup>ϕ<sup>2</sup> in the infinite semantics. That is, <sup>K</sup> |=opt <sup>3</sup> <sup>¬</sup>ϕ<sup>2</sup> implies <sup>K</sup> |<sup>=</sup> <sup>¬</sup>ϕ2. Hence, we conclude <sup>K</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>2.

Consider again the same Kripke structure which has two terminating states, s<sup>3</sup> and s4, labeled by atomic proposition halt with only a self-loop. Let k = 3, and <sup>ϕ</sup><sup>3</sup> <sup>=</sup> <sup>∀</sup>πA.∃πB.(¬q<sup>π</sup><sup>B</sup> U ¬p<sup>π</sup><sup>A</sup> ). Instantiating <sup>π</sup><sup>A</sup> by trace <sup>s</sup>0s1s3, which is of the form {p}<sup>ω</sup> satisfies <sup>¬</sup>ϕ3. By Lemma 2, the fulfillment of formula implies that in infinite semantics it will be fulfilled as well. That is, <sup>K</sup> <sup>|</sup>=hpes <sup>3</sup> <sup>¬</sup>ϕ<sup>3</sup> implies <sup>K</sup> <sup>|</sup><sup>=</sup> <sup>¬</sup>ϕ3. Hence, <sup>K</sup> |<sup>=</sup> <sup>ϕ</sup>3.

Consider again the same Kripke structure with halting states and formula <sup>ϕ</sup><sup>4</sup> <sup>=</sup> <sup>∀</sup>πA.∃πB. (p<sup>π</sup><sup>A</sup> <sup>↔</sup> <sup>p</sup><sup>π</sup><sup>B</sup> ). A counterexample is an instantiation of <sup>π</sup><sup>A</sup> such that for all πB, both traces will always eventually agree on p. Trace s0s1s2s4, which is of the form {p}{p}{p}{q, halt}<sup>ω</sup> with <sup>k</sup> = 3. This trace never agrees with a trace that ends in state <sup>s</sup><sup>3</sup> (which is of the form {p}<sup>ω</sup>) and vice versa. By Lemma 2, the absence of counterexample up-to bound 3 in the halting optimistic

semantics implies that <sup>K</sup> is not a model of <sup>¬</sup>ϕ<sup>4</sup> in the infinite semantics. That is, <sup>K</sup> |=hopt <sup>3</sup> <sup>¬</sup>ϕ<sup>4</sup> implies <sup>K</sup> |<sup>=</sup> <sup>¬</sup>ϕ4. Hence, we conclude <sup>K</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>4.

### 4 Reducing BMC to QBF Solving

Given a family of Kripke structures <sup>K</sup>, a HyperLTL formula <sup>ϕ</sup>, and bound <sup>k</sup> <sup>≥</sup> <sup>0</sup>, our goal is to construct a QBF formula -<sup>K</sup>, ϕ<sup>k</sup> whose satisfiability can be used to infer whether or not K |<sup>=</sup> <sup>ϕ</sup>.

In the following paragraphs, we first describe how to encode the model and the formula, and then how to combine the two to generate the QBF query. We will illustrate the constructions using formula ϕ<sup>1</sup> in Example 1 in Section 3, whose negation is <sup>∃</sup>πA.∀πB.¬<sup>ψ</sup> with <sup>¬</sup><sup>ψ</sup> = (p<sup>π</sup><sup>A</sup> <sup>↔</sup> <sup>p</sup><sup>π</sup><sup>B</sup> ) <sup>U</sup> <sup>q</sup><sup>π</sup><sup>A</sup> .

*Encoding the models.* The unrolling of the transition relation of a Kripke structure <sup>K</sup><sup>A</sup> <sup>=</sup> S, Sinit, δ, L up to bound <sup>k</sup> is analogous to the BMC encoding for LTL [8]. First, note that the state space S can be encoded with a (logarithmic) number of bits in <sup>|</sup>S|. We introduce additional variables <sup>n</sup>0, n1,... to encode the state of the Kripke structure and use AP<sup>∗</sup> <sup>=</sup> AP ∪ {n0, n1,...} for the extended alphabet that includes the encoding of S. In this manner, the set of initial states of a Kripke structure is a Boolean formula over AP∗. For example, for the Kripke structure <sup>K</sup><sup>A</sup> in Fig. <sup>1</sup> the set of initial states (in this case <sup>S</sup>init <sup>=</sup> {s0}) corresponds to the following Boolean formula:

$$I\_A := \left(\neg n\_0 \land \neg n\_1 \land \neg n\_2\right) \land p \land \neg q \land \neg halt$$

assuming that (¬n<sup>0</sup> ∧ ¬n<sup>1</sup> ∧ ¬n2) represents state <sup>s</sup><sup>0</sup> (we need three bits to encode five states.) Similarly, R<sup>A</sup> is a binary relation that encodes the transition relation δ of K<sup>A</sup> (representing the relation between a state and its successor). The encoding into QBF works by introducing fresh Boolean variables (a new copy of AP<sup>∗</sup> for each Kripke structure K<sup>A</sup> and position), and then producing a Boolean formula that encodes the unrolling up-to k. We use x<sup>i</sup> <sup>A</sup> for the set of fresh copies of the variables AP<sup>∗</sup> of <sup>K</sup><sup>A</sup> corresponding to position <sup>i</sup> <sup>∈</sup> [0, k]. Therefore, there are <sup>k</sup>|xA<sup>|</sup> <sup>=</sup> <sup>k</sup>|AP<sup>∗</sup> <sup>A</sup><sup>|</sup> Boolean variables to represent the unrolling of <sup>K</sup>A. We use IA(x) for the Boolean formula (using variables from x) that encodes the initial states, and RA(x, x ) (for two copies of the variables x and x ) for the Boolean formula whether x encodes a successor states of x. For example, for k = 3, we unroll the transition relation up-to 3 as follows,

$$\mathbb{E}\left\|K\_A\right\|\_3 = I\_A(x\_A^0) \land R\_A(x\_A^0, x\_A^1) \land R(x\_A^1, x\_A^2) \land R(x\_A^2, x\_A^3)$$

which is the Boolean formula representing valid traces of length 4, using four copies of the variables AP<sup>∗</sup> <sup>A</sup> that represent the Kripke structure KA.

*Encoding the inner LTL formula.* The idea of the construction of the inner LTL formula is analogous to standard BMC as well, except for the choice of different semantics described in Section 3. In particular, we introduce the following inductive construction and define four different unrollings for a given <sup>k</sup>: -· pes i,k , -· opt i,k , -· hpes i,k , and -· hopt i,k .

– Inductive Case: Since the semantics only differ on the temporal operators at the end of the unrolling, the inductive case is common to all unrollings and we use -·∗ i,k to mean any of the choices of semantic (for ∗ = *pes*, *opt*, *hpes*, *hopt*). For all <sup>i</sup> <sup>≤</sup> <sup>k</sup>:

$$\begin{array}{lcl} \left[p\_{\pi}\right]\_{i,k}^{\*} &:=&p\_{\pi}^{i}\\ \left[\neg p\_{\pi}\right]\_{i,k}^{\*} &:=&\neg p\_{\pi}^{i}\\ \left[\psi\_{1}\lor\psi\_{2}\right]\_{i,k}^{\*} &:=&\left[\psi\_{1}\right]\_{i,k}^{\*}\lor\left[\psi\_{2}\right]\_{i,k}^{\*}\\ \left[\psi\_{1}\land\psi\_{2}\right]\_{i,k}^{\*} &:=&\left[\psi\_{1}\right]\_{i,k}^{\*}\land\left[\psi\_{2}\right]\_{i,k}^{\*}\\ \left[\psi\_{1}\Downarrow\psi\_{2}\right]\_{i,k}^{\*} &:=&\left[\psi\_{2}\right]\_{i,k}^{\*}\lor\left(\left[\psi\_{1}\right]\_{i,k}^{\*}\land\left[\psi\_{1}\Downarrow\psi\_{2}\right]\_{i+1,k}^{\*}\right)\\ \left[\psi\_{1}\textsf{\arrow}\ \psi\_{2}\right]\_{i,k}^{\*} &:=&\left[\psi\_{2}\right]\_{i,k}^{\*}\land\left(\left[\psi\_{1}\right]\_{i,k}^{\*}\lor\left[\psi\_{1}\right]\_{i,k}^{\*}\lor\left[\psi\_{1}\right]\_{i}^{\*}\right)\_{i+1,k}^{\*}\\ \left[\mathbf{O}\psi\right]\_{i,k}^{\*} &:=&\left[\psi\right]\_{i+1,k}^{\*}\end{array}$$

Note that, for a given path variable πA, the atom p<sup>i</sup> <sup>π</sup><sup>A</sup> that results from <sup>p</sup><sup>π</sup><sup>A</sup> <sup>∗</sup> i,k is one of the Boolean variables in <sup>x</sup><sup>i</sup> A.


$$\begin{array}{ll} \lbrack \psi \rbrack\_{k+1,k}^{ps} := \mathsf{false} & \lbrack \psi \rbrack\_{k+1,k}^{opt} := \mathsf{true} \\ \lbrack \psi \rbrack\_{k+1,k}^{hps} := \lbrack halted \rbrack\_{k,k}^{hps} \land \lbrack \psi \rbrack\_{k,k}^{hps} & \lbrack \psi \rbrack\_{k+1,k}^{hopt} := \lbrack halted \rbrack\_{k,k}^{hopt} \to \lbrack \psi \rbrack\_{k,k}^{hopt} \end{array}$$

Note that the base case defines the value to be assumed for the formula after the end k of the unrolling, which is spawned in the temporal operators in the inductive case at k. The pessimistic semantics assume the formula to be false, and the optimistic semantics assume the formula to be true. The halting cases consider the case at which the traces have halted (using in this case the evaluation at k) and using the unhalting choice otherwise.

*Example 2.* Consider again the formula <sup>¬</sup><sup>ψ</sup> = (p<sup>π</sup><sup>A</sup> <sup>↔</sup> <sup>p</sup><sup>π</sup><sup>B</sup> ) <sup>U</sup> <sup>q</sup><sup>π</sup><sup>A</sup> . Using the pessimistic semantics -¬ψ pes <sup>0</sup>,<sup>3</sup> with three steps is

$$q^0\_{\pi\_A} \lor \left( (p^0\_{\pi\_A} \leftrightarrow p^0\_{\pi\_B}) \land \left( q^1\_{\pi\_A} \lor \left( (p^1\_{\pi\_A} \leftrightarrow p^1\_{\pi\_B}) \land \left( q^2\_{\pi\_A} \lor (p^2\_{\pi\_A} \leftrightarrow p^2\_{\pi\_B}) \land q^3\_{\pi\_A} \right) \right) \right) \right) \right)$$

In this encoding, the collection x<sup>2</sup> <sup>A</sup>, contains all variables of AP<sup>∗</sup> of <sup>K</sup><sup>A</sup> (that is {p2 <sup>π</sup><sup>A</sup> , q<sup>2</sup> <sup>π</sup><sup>A</sup> ,...}) connecting to the corresponding valuation for <sup>p</sup><sup>π</sup><sup>A</sup> in the trace of <sup>K</sup><sup>A</sup> at step <sup>2</sup> in the unrolling of <sup>K</sup>A. In other words, the formula -¬ψ pes <sup>0</sup>,<sup>3</sup> uses variables from x<sup>0</sup> A, x<sup>1</sup> A, x<sup>2</sup> A, x<sup>3</sup> <sup>A</sup> and <sup>x</sup><sup>0</sup> B, x<sup>1</sup> B, x<sup>2</sup> B, x<sup>3</sup> <sup>B</sup> (that is, from <sup>x</sup><sup>A</sup> and <sup>x</sup>B).

*Combining the encodings.* Now, let ϕ be a HyperLTL formula of the form <sup>ϕ</sup> <sup>=</sup> <sup>Q</sup>AπA.QBπB.....QZπZ.ψ and <sup>K</sup> <sup>=</sup> KA, KB,...,KZ. Combining all the components, the encoding of the HyperLTL BMC problem in QBF is the following (for <sup>∗</sup> <sup>=</sup> *pes*, *opt*, *hpes*, *hopt*):

$$\left[\mathbb{K}, \varphi\right]\_k^\* = \mathbb{Q}\_A \overline{x\_A}. \mathbb{Q}\_B \overline{x\_B} \dots \mathbb{Q}\_Z \overline{x\_Z} \Big( \left[K\_A\right]\_k \circ\_A \left[K\_B\right]\_k \circ\_B \dots \left[K\_Z\right]\_k \circ\_Z \left[\psi\right]\_{0,k}^\* \Big)$$

where ψ∗ <sup>0</sup>,k is the choice of semantics, ◦<sup>j</sup> <sup>=</sup> <sup>∧</sup> if <sup>Q</sup><sup>j</sup> <sup>=</sup> <sup>∃</sup>, and ◦<sup>j</sup> <sup>=</sup> if <sup>Q</sup><sup>j</sup> <sup>=</sup> <sup>∀</sup>, for <sup>j</sup> <sup>∈</sup> *Vars*(ϕ).

*Example 3.* Consider again Example 2. To combine the model description with the encoding of the HyperLTL formula, we use two identical copies of the given Kripke structure to represent different paths π<sup>A</sup> and π<sup>B</sup> on the model, denoted as K<sup>A</sup> and KB. The final resulting formula is:

$$\{\mathbb{K}, \neg \varphi\}\_3 := \exists \overline{x\_A}. \forall \overline{x\_B}. \left(\[K\_A\]\_3 \land \left(\[K\_B\]\_3 \to \|\neg \varphi\|\_{0,3}^{pes}\right)\right)$$

The sequence of assignments (¬n2, <sup>¬</sup>n1, <sup>¬</sup>n0, p, <sup>¬</sup>q, <sup>¬</sup>halt)<sup>0</sup> (¬n2, <sup>¬</sup>n1, n0, p, <sup>¬</sup>q, <sup>¬</sup>halt)<sup>1</sup> (¬n2, n1, <sup>¬</sup>n0, p, <sup>¬</sup>q, <sup>¬</sup>halt)<sup>2</sup> (n2, <sup>¬</sup>n1, <sup>¬</sup>n0, <sup>¬</sup>p, q, halt)<sup>3</sup> on <sup>K</sup>A, corresponding to the path <sup>s</sup>0s1s2s4, satisfies -¬ϕ pes <sup>0</sup>,<sup>3</sup> for all traces on KB. The satisfaction result shows that -<sup>K</sup>, <sup>¬</sup>ϕ pes <sup>3</sup> is true, indicating that a witness of violation is found. Theorem 1, by a successful detection of a counterexample witness, and the use of the pessimistic semantics, allows to conclude that K |<sup>=</sup> <sup>ϕ</sup>.

The main result of this section is Theorem 1 that connects the output of the solver to the original model checking problem. We first show an auxiliary lemma.

Lemma 3. *Let* <sup>ϕ</sup> *be a closed HyperLTL formula and* <sup>T</sup> <sup>=</sup> *Traces*(K) *be an interpretation. For* <sup>∗</sup> <sup>=</sup> *pes*, *opt*, *hpes*, *hopt, it holds that*

> -<sup>K</sup>, ϕ ∗ <sup>k</sup> *is satisfiable if and only if* (<sup>T</sup> , Π∅, 0) <sup>|</sup>=<sup>∗</sup> <sup>k</sup> ϕ.

*Proof (sketch).* The proof proceeds in two steps. First, let ψ be the largest quantifier-free sub-formula of ϕ. Then, every tuple of traces of length k (one for each π) is in one-to-one correspondence with the collection of variables p<sup>i</sup> π, that satisfies that the tuple is a model of ψ (in the choice semantics) if and only if the corresponding assignment makes ψ∗ <sup>0</sup>. Then, the second part shows inductively in the stack of quantifiers that each subformula obtained by adding a quantifier is satisfiable if and only if the semantics hold.

Lemma 3, together with Lemma 2, allows to infer the outcome of the model checking problem from satisfying (or unsatisfying) instances of QBF queries, summarized in the following theorem.

Theorem 1. *Let* ϕ *be a HyperLTL formula. Then, 1. For* <sup>∗</sup> <sup>=</sup> pes, hpes*, if* -<sup>K</sup>, <sup>¬</sup>ϕ<sup>∗</sup> <sup>k</sup> *is satisfiable, then* K |<sup>=</sup> <sup>ϕ</sup>*. 2. For* <sup>∗</sup> <sup>=</sup> opt, hopt*, if* -<sup>K</sup>, <sup>¬</sup>ϕ<sup>∗</sup> <sup>k</sup> *is unsatisfiable, then* K |<sup>=</sup> <sup>ϕ</sup>*.*

Table 1 illustrates what Theorem 1 allows to soundly conclude from the output of the QBF solver about the model checking problem of formulas from Example 1 in Section 3.

### 5 Evaluation and Case Studies

We now evaluate our approach by a rich set of case studies on information-flow security, concurrent data structures, path planning for robots, and mutation testing. In this section, we will refer to each property in HyperLTL as in Table 2.


Table 1: Comparison of Properties with Different Semantics

We have implemented the technique described in Section 4 in our tool HyperQube. Given a transition relation, the tool automatically unfolds it up to <sup>k</sup> <sup>≥</sup> <sup>0</sup> by a home-grown procedure written in Ocaml, called genqbf. Given the choice of the semantics (pessimistic, optimistic, and halting variants) the unfolded transition relation is combined with the QBF encoding of the input HyperLTL formula to form a complete QBF instance which is then fed to the QBF solver QuAbS [28]. All experiments in this section are run on an iMac desktop with Intel i7 CPU @3.4 GHz and 32 GB of RAM. A full description of the systems and formulas used can be accessed in the longer version of this paper [30].

Case Study 1: Symmetry in Lamport's Bakery algorithm [12]. Symmetry states that no specific process has special privileges in terms of a faster access to the critical section (see different symmetry formulas in Table 2). In these formulas, each process P<sup>n</sup> has a program counter denoted by pc(Pn), select indicates which process is selected to process next, pause if both processes are not selected, sym\_break is which process is selected after a tie, and sym(select<sup>π</sup><sup>A</sup> , select<sup>π</sup><sup>B</sup> ) indicates if two traces are selecting two opposite processes. The Bakery algorithm does not satisfy symmetry (i.e. <sup>ϕ</sup>sym<sup>1</sup> ), because when two or more processes are trying to enter the critical section with the same ticket number, the algorithm always gives priority to the process with the smaller process ID. HyperQube returns SAT using the pessimistic semantics, indicating that there exists a counterexample in the form of a falsifying witness to <sup>π</sup><sup>A</sup> in formula <sup>ϕ</sup>sym<sup>1</sup> . Table <sup>3</sup> includes our result on other symmetry formulas presented in Table 2.

Case Study 2: Linearizability in SNARK [14]. SNARK implements a concurrent double-ended queue using double-compare-and-swap (DCAS) and a doubly linked-list that stores values in each node. *Linearizability* [29] requires that any history of execution of a concurrent data structure (i.e., sequence of *invocation* and *response* by different threads) matches some sequential order of invocations and responses (see formula ϕlin in Table 2). SNARK is known to have two linearizability bugs and HyperQube returns SAT using the pessimistic semantics, identifying both bugs as two counterexamples. The bugs we identified are precisely the same as the ones reported in [14].


Table 2: Hyperproperties investigated in case studies.

Case Study 3: Non-interference in multi-threaded Programs. *Noninterference* [25] states that low-security variables are independent from the high-security variables, thus preserving secure information flow. We consider the concurrent program example in [32], where PIN is high security input and Result is low security output. HyperQube returns SAT in the halting pessimistic semantics, indicating that there is a trace that we can detect the difference of a high-variable by observing a low variable, that is, violating non-interference. We also verified the correctness of a fix to this algorithm, proposed in [32] as well. HyperQube uses the UNSAT results from the solver (with halting optimistic semantics) to infer the absence of violation, that is, verification of *non-interference*.

Case Study 4: Fairness in non-repudiation protocols. A *non-repudiation* protocol ensures that a receiver obtains a receipt from the sender, called *nonrepudiation of origin* (NRO), and the sender ends up having an evidence, named *non-repudiation of receipt* (NRR), through a trusted third party. A non-repudiation protocol is *fair* if both NRR and NRO are either received or not received by the parties (see formula ϕfair in Table 2). We verified two different protocols from [31], namely, <sup>T</sup>incorrect that chooses not to send out NRR after receiving NRO, and a correct implementation <sup>T</sup>correct which is fair. For <sup>T</sup>correct

(respectively, <sup>T</sup>incorrect), HyperQube returns UNSAT in the halting optimistic semantics (respectively, SAT in the halting pessimistic semantics), which indicates that the protocol satisfies (respectively, violates) fairness.

Case Study 5: Path planning for robots. We have used HyperQube beyond verification, to synthesize strategies for robotic planning [34]. Here, we focus on producing a strategy that satisfies two control requirements for a robot to reach a goal in a grid. First, the robot should take the *shortest path* (see formula <sup>ϕ</sup>sp in Table 2). Fig. <sup>2</sup> shows a 10×10 grid, where the red, green, and black cells are initial, goal, and blocked cells, respectively. HyperQube returns SAT and the synthesized path is shown by the blue arrows. We also used HyperQube to solve the *path robustness* problem, meaning that starting from an arbitrary initial state, a robot reaches the goal by following a single strategy (see formula <sup>ϕ</sup>rb in Table 2). Again, HyperQube returns SAT for the grid shown in Fig. 3.

Fig. 2: Shortest Path

Fig. 3: Robust path Case Study 6: Mutation testing. We adopted the model from [15] and apply the original formula that de-

scribes a good test mutant together with the model (see formula ϕmut in Table 2). HyperQube returns SAT, indicating successful finding of a qualified mutant. We

note that in [15] the authors were not able to generate test cases via ϕmut, as the model checker MCHyper is not able to handle quantifier alternation in pushbutton fashion.

Results and analysis. Table 3 summarizes our results including running times, the bounded semantics applied, the output of the QBF solver, and the resulting infinite inference conclusion using Theorem 1. As can be seen, our case studies range over model checking of different fragments of HyperLTL. It is important to note that HyperQube run time consists of generating a QBF formula by genqbf and then checking its satisfiability by QuAbS. It is remarkable that in some cases, QBF formula generation takes longer than checking its satisfiability. The models in our experiments also have different sizes. The most complex case study is arguably the SNARK algorithm, where we identify both bugs in the algorithm in 472 and 1497 seconds. In cases 5.1 – 6.2, we also demonstrate the ability of HyperQube to solve synthesis problems by leveraging the existential quantifier in a HyperLTL formula.

Finally, we elaborate more on scalability of the path planning problem for robots. This problem was first studied in [34], where the authors reduce the problem to SMT solving using Z3 [13] and by eliminating the trace quantifiers through a combinatorial enumeration of conjunctions and disjunctions. Table 4 compares our approach with the brute-force technique employed in [34] for different grid sizes. Our QBF-based approach clearly outperforms the solution in [34], in some cases by an order of magnitude.


Table 3: Performance of HyperQube, where column *case#* identifies the artifact, ✓ denotes satisfaction, and ✗ denotes violation of the formula. AP<sup>∗</sup> is the set of Boolean variables encoding K.


Table 4: Path planning for robots and comparison to [34]. All cases use the halting pessimistic semantics and QBF solver returns SAT, meaning successful path synthesis.

### 6 Related Work

There has been a lot of recent progress in automatically verifying [12,22–24] and monitoring [1,6,7,20,21,26,33] HyperLTL specifications. HyperLTL is also supported by a growing set of tools, including the model checker MCHyper [12,24], the satisfiability checkers EAHyper [19] and MGHyper [17], and the runtime monitoring tool RVHyper [20]. The complexity of *model checking* for HyperLTL for treeshaped, acyclic, and general graphs was rigorously investigated in [2]. The first algorithms for model checking HyperLTL and HyperCTL<sup>∗</sup> using alternating automata were introduced in [24]. These techniques, however, were not able to deal in practice with alternating HyperLTL formulas in a fully automated fashion. We also note that previous approaches that reduce model checking HyperLTL typically of formulas without quantifier alternations—to model checking LTL can use BMC in the LTL model checking phase. However, this is a different approach than the one presented here, as these approaches simply instruct the model checker to use a BMC *after* the problem has been fully reduced to an LTL model checking problem while we avoid this translation. These algorithms were then extended to deal with hyperliveness and alternating formulas in [12] by finding a winning strategy in ∀∃ games. In this paper, we take an alternative approach by reducing the model checking problem to QBF solving, which is arguably more effective for finding bugs (in case a finite witness exists).

The *satisfiability* problem for HyperLTL is shown to be undecidable in general but decidable for the ∃<sup>∗</sup>∀<sup>∗</sup> fragment and for any fragment that includes a ∀∃ quantifier alternation [16]. The hierarchy of hyperlogics beyond HyperLTL were studied in [11]. The synthesis problem for HyperLTL has been studied in [3] in the form of *program repair*, in [4] in the form of *controller synthesis*, and in [18] for the general case.

### 7 Conclusion and Future Work

We introduced the first bounded model checking (BMC) technique for verification of hyperproperties expressed in HyperLTL. To this end, we proposed four different semantics that ensure the soundness of inferring the outcome of the model checking problem. To handle trace quantification in HyperLTL, we reduced the BMC problem to checking satisfiability of quantified Boolean formulas (QBF). This is analogous to the reduction of BMC for LTL to the simple propositional satisfiability problem. We have introduced different classes of semantics, beyond the pessimistic semantics common in LTL model checking, namely *optimistic* semantics that allow to infer full verification by observing only a finite prefix and *halting* variations of these semantics that additionally exploit the termination of the execution, when available. Through a rich set of case studies, we demonstrated the effectiveness and efficiency of our approach in verification of information-flow properties, linearizability in concurrent data structures, path planning in robotics, and fairness in non-repudiation protocols.

As for future work, our first step is to solve the loop condition problem. This is necessary to establish completeness conditions for BMC and can help cover even more examples efficiently. The application of QBF-based techniques in the framework of abstraction/refinement is another unexplored area. Success of BMC for hyperproperties inherently depends on effectiveness of QBF solvers. Even though QBF solving is not as mature as SAT/SMT solving techniques, recent breakthroughs on QBF have enabled the construction of our tool HyperQube, and more progress in QBF solving will improve its efficiency.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

\* **Eva ul**

**Consistent \* Complete \* Well Documen et d**

\* TACAS \* **Ar**

**t ysaE \***

**ifact** \* AEC

**o Reuse \***

**det a**

### **tCounterexample-Guided Prophecy for Model Checking Modulo the Theory of Arrays**

Makai Mann1(-) , Ahmed Irfan<sup>1</sup> , Alberto Griggio<sup>2</sup> , Oded Padon1,3, and Clark Barrett<sup>1</sup>

<sup>1</sup> Stanford University, Stanford, USA {makaim,irfan,barrett}@cs.stanford.edu <sup>2</sup> Fondazione Bruno Kessler, Trento, Italy griggio@fbk.eu

<sup>3</sup> VMware Research, Palo Alto, USA oded.padon@gmail.com

**Abstract.** We develop a framework for model checking infinite-state systems by automatically augmenting them with auxiliary variables, enabling quantifier-free induction proofs for systems that would otherwise require quantified invariants. We combine this mechanism with a counterexample-guided abstraction refinement scheme for the theory of arrays. Our framework can thus, in many cases, reduce inductive reasoning with quantifiers and arrays to quantifier-free and array-free reasoning. We evaluate the approach on a wide set of benchmarks from the literature. The results show that our implementation often outperforms state-of-the-art tools, demonstrating its practical potential.

### **1 Introduction**

Model checking is a widely-used and highly-effective technique for automated property checking. While model checking finite-state systems is a well-established technique for hardware and software systems, model checking infinite-state systems is more challenging. One challenge, for example, is that proving properties by induction over infinite-state systems often requires the use of universally quantified invariants. While some automated reasoning tools can reason about quantified formulas, such reasoning is typically not very robust. Furthermore, just discovering these quantified invariants remains very challenging.

Previous work (e.g., [52]) has shown that prophecy variables can sometimes play the same role as universally quantified variables, making it possible to transform a system that would require quantified reasoning into one that does not. However, to the best of our knowledge, there has been no automatic method for applying such transformations. In this paper, we introduce a technique we call counterexample-guided prophecy. During the refinement step of an abstraction-refinement loop, our technique automatically introduces prophecy variables, which both help with the refinement step and may also reduce the need for quantified reasoning. We demonstrate the technique in the context of model checking for infinite-state systems with arrays, a domain which is known for requiring quantified reasoning. We show how a standard abstraction for arrays can be augmented with counterexample-guided prophecy to obtain an algorithm that reduces the model checking problem to quantifier-free, array-free reasoning.

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 113–132, 2021. https://doi.org/10.1007/978-3-030-72016-2 7

The paper makes the following contributions: i) we introduce an algorithm called **Prophecize** which uses history and prophecy variables to target a specific term at a specific time step of an execution, producing a new transition system that can effectively reason universally about that term; ii) we develop an automatic abstraction-refinement procedure for arrays, which leverages the **Prophecize** algorithm during the refinement step, and show that it is sound and produces no false positives; iii) we develop a prototype implementation of our technique; and iv) we evaluate our technique on four sets of model checking benchmarks containing arrays and show that our implementation outperforms state-of-the-art tools on a majority of the benchmark sets.

### **2 Background**

We assume the standard many-sorted first-order logical setting with the usual notions of signature, term, formula, and interpretation. A *theory* is a pair T = (Σ, **I**) where Σ is a signature and **I** is a class of Σ-interpretations, the *models* of <sup>T</sup>. A <sup>Σ</sup>-formula <sup>ϕ</sup> is *satisfiable* (resp., *unsatisfiable*) *in* <sup>T</sup> if it is satisfied by some (resp., no) interpretation in **I**. Given an interpretation M, a variable assignment <sup>s</sup> over a set of variables <sup>X</sup> is a mapping that assigns each variable <sup>x</sup> <sup>∈</sup> <sup>X</sup> of sort <sup>σ</sup> to an element of <sup>σ</sup>M, denoted <sup>x</sup><sup>s</sup>. We write <sup>M</sup>[s] for the interpretation that is equivalent to <sup>M</sup> except that each variable <sup>x</sup> <sup>∈</sup> <sup>X</sup> is mapped to <sup>x</sup><sup>s</sup>. Let <sup>x</sup> be a variable, <sup>t</sup> a term, and <sup>φ</sup> a formula. We denote with <sup>φ</sup>{<sup>x</sup> <sup>→</sup> <sup>t</sup>} the formula obtained by replacing every free occurrence of x in φ with t. We extend this notation to sets of variables and terms in the usual way. If f and g are two functions, we write <sup>f</sup> ◦<sup>g</sup> to mean functional composition, i.e., <sup>f</sup> ◦g(x) = <sup>f</sup>(g(x)).

Let T<sup>A</sup> be the standard theory of arrays [50] with extensionality, extended with constant arrays. Concretely, we assume sorts for arrays, indices, and elements, and function symbols read, write, and constarr . Here and below, we use a and b to refer to arrays, i and j to refer to array indices, and e and c to refer to array elements, where c is also restricted to be an interpreted constant. The theory contains the class of all interpretations satisfying the following axioms:

$$\begin{aligned} \forall a, i, j, e. \, i = j &\implies read(write(a, j, e), i) = e \land\\ \, i \neq j &\implies read(write(a, j, e), i) = read(a, i) \end{aligned} \tag{write}$$

$$\forall a, b. \left(\forall i. \,\,read(a, i) = read(b, i)\right) \implies a = b \tag{ext}$$

$$\forall i. \; read(constant(c), i) = c \tag{const}$$

**Symbolic Transition Systems and Model Checking.** For generality, assume a background theory <sup>T</sup> with signature <sup>Σ</sup>. We will assume that all terms and formulas are Σ-terms and Σ-formulas, that entailment is entailment modulo T , and interpretations are T -interpretations. A symbolic transition system (STS) <sup>S</sup> is a tuple <sup>S</sup> := X, I, T, where <sup>X</sup> is a finite set of state variables, <sup>I</sup>(X) is a formula denoting the initial states of the system, and T(X, X ) is a formula expressing a transition relation. Here, X is the set obtained by replacing each variable <sup>x</sup> <sup>∈</sup> <sup>X</sup> with a new variable <sup>x</sup> of the same sort. Let prime(x) = <sup>x</sup> be the bijection corresponding to this replacement. We say that a variable x is frozen if <sup>T</sup> <sup>|</sup><sup>=</sup> <sup>x</sup> <sup>=</sup> <sup>x</sup>. When the state variables are obvious, we will often drop <sup>X</sup>.

A state <sup>s</sup> of <sup>S</sup> is a variable assignment over <sup>X</sup>. An execution of <sup>S</sup> of length <sup>k</sup> is a pair M, π, where <sup>M</sup> is an interpretation and <sup>π</sup> := <sup>s</sup>0, s1,...,sk−<sup>1</sup> is a path of length <sup>k</sup>, a sequence of states such that <sup>M</sup>[s0] <sup>|</sup><sup>=</sup> <sup>I</sup>(X) and <sup>M</sup>[si][si+1 ◦ prime−1] <sup>|</sup><sup>=</sup> <sup>T</sup>(X, X ) for all 0 <sup>≤</sup> i<k <sup>−</sup> 1. When reasoning about paths, it is often convenient to have multiple copies of the state variables X. We use X@n to denote the set of variables obtained by replacing each variable <sup>x</sup> <sup>∈</sup> <sup>X</sup> with a new variable called x@n of the same sort. We refer to these as timed variables. A state <sup>s</sup> is reachable in <sup>S</sup> if it appears in a path of some execution of <sup>S</sup>. We say that a formula <sup>P</sup>(X) is an invariant of <sup>S</sup>, denoted by S |<sup>=</sup> <sup>P</sup>(X), if <sup>P</sup>(X) is satisfied in every reachable state of <sup>S</sup> (i.e., for every execution M, π, <sup>M</sup>[s] <sup>|</sup><sup>=</sup> <sup>P</sup>(X) for each <sup>s</sup> in <sup>π</sup>). The invariant checking problem is, given <sup>S</sup> and <sup>P</sup>(X), to determine if S |<sup>=</sup> <sup>P</sup>(X). A counterexample is an execution M, π of <sup>S</sup> of length <sup>k</sup> such that <sup>M</sup>[s<sup>k</sup>−1] |<sup>=</sup> <sup>P</sup>(X). If <sup>I</sup>(X) <sup>|</sup><sup>=</sup> <sup>φ</sup>(X) and <sup>φ</sup>(X) <sup>∧</sup> <sup>T</sup>(X, X ) <sup>|</sup><sup>=</sup> <sup>φ</sup>(X ), then φ(X) is an inductive invariant. Every inductive invariant is an invariant (by induction over path length). In this paper we focus on model checking problems where I, T and P are quantifier-free. However, a quantified inductive invariant might still be necessary to prove a property of the system.

Bounded Model Checking (BMC) is a bug-finding technique which attempts to find a counterexample for a property, P(X), of length k for some finite k [9]. A single BMC query at bound k for an invariant property uses a constraint solver to check the satisfiability of the following formula: BMC(S, P, k) := <sup>I</sup>(X@0) <sup>∧</sup> ( k−1 <sup>i</sup>=0 <sup>T</sup>(X@i, X@(i+ 1)))∧¬P(X@k). If the query is satisfiable, there is a bug.

**Counterexample-Guided Abstraction Refinement (CEGAR).** CEGAR is a general technique in which a difficult conjecture is tackled iteratively [44]. Algorithm 1 shows a simple CEGAR loop for checking an invariant P for an STS S. It is parameterized by three functions. The **Abstract** function produces an initial abstraction of the problem. It must satisfy the contract that if S, <sup>P</sup> <sup>=</sup> **Abstract**(S, P), then <sup>S</sup> <sup>|</sup><sup>=</sup> <sup>P</sup> <sup>=</sup>⇒ S|<sup>=</sup> <sup>P</sup>. The next function is the **Prove** function. This can be any (unbounded) model-checking algorithm that can return counterexamples. It checks whether a given property P is an invariant of a given STS S. If it is, it returns with proven set to true. Otherwise, it returns a bound k at which a counterexample exists. The final function is **Refine**. It takes the abstracted STS and property together with a bound k at which a known counterexample for the abstract STS exists. Its job is to refine the abstraction until there is no longer a counterexample of size k. If it succeeds, it returns the new STS and property. It fails if there is an actual counterexample of size k for the concrete system. In this case, it sets the return value refined to false.

**Auxiliary variables.** We finish this section with relevant background on auxiliary variables, a crucial part of the refinement step described in Sec. 4. Auxiliary variables are new variables added to the system which do not influence its behavior (i.e., the reduct to the old set of variables of any reachable state in the new system is a reachable state in the old system), but may assist in proofs. There are two main categories of auxiliary variables we consider: history and


prophecy. History variables, also known as ghost state, preserve a value, making its past value available in future states. Prophecy variables are the dual of history variables and provide a way to refer to a value that occurs in a future state. Abadi and Lamport formally characterized soundness conditions for the introduction of history and prophecy variables [1]. Here, we consider a simple, structured form of history variables.

**Definition 1.** Let <sup>S</sup> <sup>=</sup> X, I, T be an STS, <sup>t</sup> a term whose free variables are in <sup>X</sup>, and n > <sup>0</sup>, then *Delay*(S, t, n) returns a new STS and variable Xh, Ih, T <sup>h</sup>, <sup>h</sup><sup>n</sup> t , where <sup>X</sup><sup>h</sup> <sup>=</sup> <sup>X</sup> {h<sup>1</sup> t ,..., <sup>h</sup><sup>n</sup> t }, <sup>I</sup><sup>h</sup> <sup>=</sup> <sup>I</sup>, and <sup>T</sup> <sup>h</sup> <sup>=</sup> <sup>T</sup> ∪ {h<sup>1</sup> t = <sup>t</sup>} ∪ <sup>n</sup> i=2{h<sup>i</sup> <sup>=</sup> <sup>h</sup>i−<sup>1</sup> t }.

t The **Delay** operator makes the current value of a term t available for the next n states in a path. This is accomplished by adding n new history variables and creating an assignment chain that passes the value to the next history variable at each state. Thus, h<sup>k</sup> t contains the value that <sup>t</sup> had <sup>k</sup> states ago. The initial value of each history variable is unconstrained.

**Theorem 1.** Let <sup>S</sup> <sup>=</sup> X, I, T be an STS, <sup>P</sup> a property, and *Delay*(S, v, n) = S<sup>h</sup>, <sup>h</sup><sup>n</sup> v . Then S |<sup>=</sup> <sup>P</sup> iff <sup>S</sup><sup>h</sup> <sup>|</sup><sup>=</sup> <sup>P</sup>.

We refer to [1] for a general proof which subsumes Theorem 1. In contrast to the general approach for history variables, we use a version of prophecy that only requires a single frozen variable. The motivation for this is that a frozen variable can be used in place of a universal quantifier, as the following theorem adapted from [52] shows.

**Theorem 2.** Let <sup>S</sup> <sup>=</sup> X, I, T be an STS, <sup>x</sup> a variable in formula <sup>P</sup>(X), and v a fresh variable (i.e., not in X or X ). Let <sup>S</sup><sup>p</sup> <sup>=</sup> <sup>X</sup> ∪ {v},I,T ∪ {v <sup>=</sup> <sup>v</sup>}. Then S |<sup>=</sup> <sup>∀</sup> x. P(X) iff <sup>S</sup><sup>p</sup> <sup>|</sup><sup>=</sup> <sup>P</sup>(X){<sup>x</sup> <sup>→</sup> <sup>v</sup>}.

Theorem 2 shows that a universally quantified variable in an invariant can be replaced with a fresh symbol in a process similar to skolemization. The intuition is as follows. The frozen variable has the same value in all states, but it is uninitialized by <sup>I</sup>. Thus, for each path in <sup>S</sup>, there is a corresponding path (i.e., identical except at <sup>v</sup>) in <sup>S</sup><sup>p</sup> for every possible value of <sup>v</sup>. This proliferation of paths plays the same role as the quantified variable in P. We mention here one more theorem from [52]. This one allows us to introduce a universal quantifier.

# **Algorithm 2 Prophecize**(X, I, T, P(X), t, n)

1: **if** n=0 **then** 2: **return** X {pt},I,T ∪ {p <sup>t</sup> = pt} , p<sup>t</sup> = t =⇒ P(X), p<sup>t</sup> 3: **else** 4: <sup>X</sup><sup>h</sup>, I<sup>h</sup>, T <sup>h</sup> , *h*<sup>n</sup> t := **Delay**( X, I, T , t, n) 5: **return** <sup>X</sup><sup>h</sup> {p<sup>n</sup> <sup>t</sup> },I,T ∪ {p<sup>n</sup>- <sup>t</sup> = p<sup>n</sup> <sup>t</sup> } , p<sup>t</sup> = *h*<sup>n</sup> t <sup>=</sup><sup>⇒</sup> <sup>P</sup>(X), p<sup>n</sup> t 6: **end if**

**Theorem 3.** Let <sup>S</sup> <sup>=</sup> X, I, T be an STS, <sup>P</sup>(X) a formula, and <sup>t</sup> a term. Then, S |<sup>=</sup> <sup>P</sup>(X) iff S |<sup>=</sup> <sup>∀</sup> y.(<sup>y</sup> <sup>=</sup> <sup>t</sup> <sup>=</sup><sup>⇒</sup> <sup>P</sup>(X)), where <sup>y</sup> is not free in <sup>P</sup>(X).

Theorems 2 and 3 are special cases of Theorems 3 and 4 of [52]. The original theorems handle the more general case where P(X) can be a temporal formula.

### **3 Using Auxiliary Variables to Assist Induction**

We can use Theorem 3 followed by Theorem 2 to introduce frozen prophecy variables that predict the value of a term t when the property P is being checked. We refer to t as the prophecy target and the process as universal prophecy. If we also use **Delay**, we can target a term at some finite number of steps before the property is checked. This is captured by Algorithm 2, which takes a transition system, property <sup>P</sup>(X), term <sup>t</sup>, and <sup>n</sup> <sup>≥</sup> 0. If <sup>n</sup> = 0, it introduces a universal prophecy variable for t. Otherwise, it first introduces history variables for t and then applies universal prophecy to the delayed t. In either case it returns the augmented system, augmented property, and the prophecy variable.

We will use the STS shown in Fig. 1(a) as a running example throughout the paper (it is inspired by the hardware example from [10]). We assume the background theory T includes integer arithmetic and arrays of integers indexed by integers. The variables in this STS include an array and four integer variables, representing the read index, write index, read data, and write data, respectively. The system starts with an array of all zeros. At every step, if the write data is less than 200, it writes that data to the array at the write index. Otherwise, the array stays the same. Additionally, the read data is updated with the current value of <sup>a</sup> at <sup>i</sup>r . This effectively introduces a one-step delay between when the value is read from a and when the value is present in dr. The property is that d<sup>r</sup> < 200. This property is clearly true, but it is not straightforward to prove with standard model checking techniques because it is not inductive. Note that it is also not k-inductive for any k [59]. The primary issue is that it does not constrain the value of a at all, so in an inductive proof, the value of a could be anything in the induction hypothesis.

One way to prove the property is to strengthen it with the quantified invariant: <sup>∀</sup> i. read(a, <sup>i</sup>) <sup>&</sup>lt; 200. Remarkably, observe that by augmenting the system using **Prophecize**, it is possible to prove the property using only a quantifierfree invariant. In this case, the relevant prophecy target is the value of <sup>i</sup>r one

$$\begin{aligned} I &:= a = \operatorname{constant}(0) \land d\_r < 200\\ T &:= a' = \operatorname{ite}(d\_w < 200, & T &:= a' = \operatorname{ite}(d\_w < 200, \\ & & \operatorname{write}(a, i\_w, d\_w), a) \land & \operatorname{write}(a, i\_w, d\_w), a) \land \\ d\_r' &= \operatorname{read}(a, i\_r) & d\_r' = \operatorname{read}(a, i\_r) \land p\_{i\_r}^{1'} = p\_{i\_r}^1 \land h\_{i\_r}^{1'} = i\_r \\ P &:= d\_r < 200 & P &:= p\_{i\_r}^1 = h\_{i\_r}^{1'} \implies d\_r < 200 \\ \text{(a)} & & \text{(b)} \end{aligned}$$

**Fig. 1:** (a) Running example. (b) Running example with prophecy variable.

step before checking the property. We run **Prophecize**(X, I, T, P, ir, 1) and it returns the system and property shown in Fig. 1(b), along with the prophecy variable p<sup>1</sup> ir . This augmented system has a simple, quantifier-free invariant which can be used to strengthen the property, making it inductive: read(a, <sup>p</sup>ir ) <sup>&</sup>lt; 200. This formula holds in the initial state because of the constant array, and if we start in a state where it holds, it still holds after a transition.

Notice that the invariant learned over the prophecy variable has the same form as the original quantified invariant. However, we have instantiated that universal quantifier with a fresh, frozen prophecy variable. Intuitively, the prophecy variable captures a proof by contradiction: assume the property does not hold, consider the value of <sup>i</sup>r one step before the first failure of the property, and then use this value to show the property holds. This example shows that auxiliary variables can be used to transform an STS without a quantifier-free inductive invariant into an STS with one. However, it is not yet clear how to identify good targets for history and prophecy variables. In the next section, we show how this can be done as part of an abstraction refinement scheme for symbolic transition systems over the theory of arrays.

### **4 Abstraction Refinement for Arrays**

We now introduce our main contribution. Given a background theory T<sup>B</sup> and a model checking algorithm for STS's over TB, we use an instantiation of the CEGAR loop in Algorithm 1 to check properties of STS's over the theory that combines T<sup>B</sup> and the theory of arrays, TA. The key idea is to abstract all array operators and then add array lemmas as needed during refinement.

**Abstract and Prove.** We use a standard abstraction for the theory of arrays, which we denote **Abstract-Arrays**. Every array sort is replaced with an uninterpreted sort, and the array variables are abstracted accordingly. Each constant array is replaced by a fresh abstract array variable, which is then constrained to be frozen (because constant arrays do not change over time). Additionally, we replace the read and write array operations with uninterpreted functions. Note that if the system contains multiple array sorts, we need to introduce a separate read and write function for each uninterpreted abstract array sort. Using uninterpreted sorts and functions for abstracting arrays is a common technique in Satisfiability Modulo Theories [7] (SMT) solvers [32]. Intuitively, our initial abstraction starts with memoryless arrays. We then incrementally refine the arrays'

$$\begin{aligned} \widehat{I} &:= \widehat{a} = \widehat{constant}0 \wedge d\_r < 200\\ \widehat{T} &:= \widehat{a}' = ite(d\_w < 200, \widehat{write}(\widehat{a}, i\_w, d\_w), \widehat{a}) \wedge \\ d\_r' &= \widehat{read}(\widehat{a}, i\_r) \wedge \widehat{constant}0' = \widehat{constant}0 \wedge \\ \widehat{P} &:= d\_r < 200 \end{aligned}$$

**Fig. 2:** Result of calling **Abstract** on the example from Fig. 1(a)

memory as needed. Fig. 2 shows the result of running **Abstract-Arrays** on the example from Fig. 1(a). **Prove** can be instantiated with any (unbounded) model checker that can accept expressions over the background theory T<sup>B</sup> combined with the theory of uninterpreted functions. In particular, due to our abstraction, the model checker does not need to support the theory of arrays.

**Refine.** Here, we explain the refinement approach for our array abstraction. At a high level, we solve a BMC problem over the abstract STS at bound k. We then look for violations of array axioms in the returned counterexample, and instantiate each violated axiom (this is essentially the same as the lazy array axiom instantiation approach used in SMT solvers [13,14,17,27]). We then lift these axioms to the STS-level by modifying the STS. It is this step that may require introducing auxiliary variables. The details are shown in Algorithm 3.

We start by computing a set I of index terms with ComputeIndices – this set is used in the lazy axiom instantiation step below. We add to I every term that appears in a read or write operation in BMC(S, P,k ). We also add a witness index for every array equality - the witness corresponds to a skolemized existential variable in the contrapositive of axiom (ext). For soundness, we must add an extra variable λ<sup>σ</sup> for each index sort σ and constrain it to be different from all the other index variables of the same sort (this is based on the approach in [13]). Intuitively, this variable represents an arbitrary index different from those mentioned in the STS. We assume that the index sorts are from an infinite domain so that a distinct element is guaranteed. For simplicity of presentation, we also assume from now on that there is only a single index sort (e.g. integers). Otherwise, I must be partitioned by sort. For the abstract STS in Fig. 2, with k = 1, the index set would be <sup>I</sup> := {ir@0, <sup>i</sup>w @0, w0@0, w1@0, λInt@0, <sup>i</sup>r@1, <sup>i</sup>w @1, w0@1, w1@1, λInt@1}, where w<sup>0</sup> and w<sup>1</sup> are witness indices.

After computing indices, the algorithm enters the main loop. We first check the BMC(S, P,k ) query. The result <sup>ρ</sup> is either a counterexample, or the distinguished value ⊥, indicating that the query is unsatisfiable. If it is the latter, then we return the refined STS and property, as the property now holds on the STS up to bound k. Otherwise, we continue. The next step (line 5) is to find violations of array axioms in the execution <sup>ρ</sup> based on the index set <sup>I</sup>.

CheckArrayAxioms takes two arguments, a counterexample and an index set, and returns instantiated array axioms that do not hold over the counterexample. This works as follows. We first look for occurrences of write in the BMC formula.

# **Algorithm 3 Refine-Arrays** (S := X, I, <sup>T</sup>, P,k )

1: I ← *ComputeIndices*(S, P,k ) 2: **loop** 3: ρ ← BMC(S, P,k ) 4: **if** ρ = ⊥ **then return** X, I, T , P , *true* // Property holds up to bound k 5: *ca*, *nca* ← *CheckArrayAxioms*(ρ, I) 6: **if** *ca* = ∅ ∧ *nca* = ∅ **then return** X, I, T , P , f alse // True counterexample 7: // Go through non-consecutive array axiom instantiations 8: **for** ax, i@n<sup>i</sup> ∈ *nca* **do** 9: **let** nmin := *min*(τ (ax)\{ni}) 10: <sup>X</sup><sup>p</sup>, I<sup>p</sup>, T <sup>p</sup> , P<sup>p</sup>, p<sup>k</sup>−n<sup>i</sup> <sup>i</sup> ← **Prophecize**( X, I, T , P , i, k − ni) 11: ax<sup>c</sup> <sup>←</sup> ax{i@n<sup>i</sup> → <sup>p</sup><sup>k</sup>−n<sup>i</sup> <sup>i</sup> @nmin} 12: *ca* ← *ca* {axc@nmin} // add consecutive version of axiom 13: I←I{p<sup>k</sup>−n<sup>i</sup> <sup>i</sup> @0,...,p<sup>k</sup>−n<sup>i</sup> <sup>i</sup> @k} 14: <sup>X</sup> <sup>←</sup> <sup>X</sup><sup>p</sup>; <sup>I</sup> <sup>←</sup> <sup>I</sup><sup>p</sup>; <sup>T</sup> <sup>←</sup> <sup>T</sup> <sup>p</sup>; <sup>P</sup> <sup>←</sup> <sup>P</sup><sup>p</sup> 15: **end for** 16: // Go through consecutive array axiom instantiations 17: **for** ax ∈ *ca* **do** 18: **let** nmin := *min*(τ (ax)), nmax := *max* (τ (ax)) 19: assert(nmax = nmin ∨ nmax = nmin + 1) 20: **if** k = 0 **then** 21: I ← I ∧ ax{X@nmin → X} 22: **else if** nmin = nmax **then** 23: T ← T ∧ ax{X@nmin → X} ∧ ax{X@nmin → X } 24: **else** 25: T ← T ∧ ax{X@nmin → X}{X@(nmin + 1) → X } 26: **end if** 27: **end for** 28: **end loop**

For each such occurrence, we instantiate the (write) axiom so that the write term in the axiom matches the term in the formula (i.e., we use the write term as a trigger). This instantiates all quantified variables except for i. We then instantiate i once for each variable in the index set. We evaluate each of the instantiated axioms using the values from the counterexample and keep those instantiations that reduce to false. We do the same thing for the (const) axiom, using each constant array term in the BMC formula as a trigger. Finally, for each array equality a@m = b@n in the BMC formula, we check an instantiation of the contrapositive of (ext): <sup>a</sup>@<sup>m</sup> <sup>=</sup> <sup>b</sup>@<sup>n</sup> <sup>→</sup> read(a@m, <sup>w</sup>i@n) <sup>=</sup> read(b@n, <sup>w</sup>i@n). We add instantiated formulas that do not hold in ρ to the set of violated axioms.

CheckArrayAxioms sorts the collected axiom instantiations into two sets based on which timed variables they contain. The consecutive set contains formulas with timed variables whose timing differs by at most one; whereas the timed variables in the formulas contained in the non-consecutive set may differ by more. Formally, let τ be a function which takes a single timed variable and returns its time (e.g., τ (i@2) = 2). We lift this to formulas by having τ (φ) return the set of all time-steps for variables in φ. A formula φ is consecutive iff max (<sup>τ</sup> (φ))−min(<sup>τ</sup> (φ)) <sup>≤</sup> 1. Note that instantiations of (ext) are consecutive by construction. Additionally, because constant arrays have the same value in all time steps, we can always choose a representative time step for instantiations of (const) that results in a consecutive formula. However, instantiations of (write) may be non-consecutive, because the variable from the index set may be from a time step that is different from that of the trigger term. CheckArrayAxioms returns the pair ca, nca, where ca is a set of consecutive axiom instantiations and nca is a set of pairs – each of which contains a non-consecutive axiom instantiation and the index-set variable that was used to create that instantiation.

At line 6, we check if the returned sets are empty. If so, then there are no array axiom violations and ρ is a concrete counterexample. In this case, the system, property, and false are returned. Otherwise, we process the two sets. In lines 8-15, we process the non-consecutive formulas. Given a non-consecutive formula ax together with its index-set variable i@ni, we first compute the minimum timestep of the axiom's other variables, nmin. We then use the **Prophecize** method to create a prophecy variable p<sup>k</sup>−n<sup>i</sup> <sup>i</sup> , that is effectively a way to refer to <sup>i</sup>@n<sup>i</sup> at time-step nmin (line 10). This allows us to create a consecutive formula ax<sup>c</sup> that is semantically equivalent to ax (line 11). This new consecutive formula is added to ca in line 12, and in line 13 the introduced prophecy variables (one for each time-step) are added to the index set. Then, line 14 updates the abstraction.

At line 17, we are left with a set of consecutive formulas to process. For each consecutive formula ax, we compute the minimum and maximum time-step of its variables (line 18), which must differ by no more than 1 (line 19). There are three cases to consider: i) when k = 0, the counterexample consists of only the initial state–we thus refine the initial state by adding the untimed version of ax to I (line 21); ii) if ax contains only variables from a single time step, then we add the untimed version of ax as a constraint for both X and X , ensuring that it will hold in every state (line 23); iii) finally, if ax contains variables from two adjacent time steps, we can translate this directly into a transition formula to be added to <sup>T</sup> (line 25). The loop then repeats with the newly refined STS.

Example. Consider again the example from Fig. 2, and suppose **Refine-Arrays** is called on <sup>S</sup> and <sup>P</sup> with <sup>k</sup> = 3. At this unrolling, one possible abstract counterexample violates the following nonconsecutive axiom instantiation:

$$\begin{aligned} (i\_r \otimes 2 = i\_w \otimes 0 \implies \widehat{read}(\widehat{write}(\widehat{a\otimes 0}, i\_w \otimes 0, d\_w \otimes 0), i\_r \otimes 2) &= d\_w \otimes 0) \wedge \\ (i\_r \otimes 2 \neq i\_w \otimes 0 \implies \widehat{read}(\widehat{write}(\widehat{a\otimes 0}, i\_w \otimes 0, d\_w \otimes 0), i\_r \otimes 2) &= \widehat{read}(\widehat{a} \otimes 0, i\_r \otimes 2) \end{aligned}$$

Calling **Prophecize**(S, P , <sup>i</sup>r , 1) returns the new STS X{h<sup>1</sup> ir , p<sup>1</sup> ir }, I, <sup>T</sup>∧h1- ir <sup>=</sup> <sup>i</sup>r <sup>∧</sup>p<sup>1</sup>- ir <sup>=</sup> <sup>p</sup><sup>1</sup> ir and the new property <sup>p</sup><sup>1</sup> ir <sup>=</sup> <sup>h</sup><sup>1</sup> ir <sup>=</sup><sup>⇒</sup> <sup>d</sup><sup>r</sup> <sup>&</sup>lt; 200. The history variable h1 ir makes the previous value of <sup>i</sup><sup>r</sup> available at each time-step, and the prophecy variable p<sup>1</sup> ir mimics a universally quantified variable. We substitute <sup>p</sup><sup>1</sup> ir@0 for <sup>i</sup>r@2 to obtain a consecutive formula. Its untimed version (and a primed version) is added to the transition relation.

We stress that processing nonconsecutive axioms using **Prophecize** is how we automatically discover the universal prophecy variable p<sup>1</sup> ir , and it is exactly the universal prophecy variable that was needed in Sec. 3 to prove correctness of the running example. An alternative approach could avoid nonconsecutive axioms using Craig interpolants [26] so that only consecutive axioms are found [15]. However, quantifier-free interpolants are not guaranteed to exist for the standard theory of arrays, and the auxiliary variables found using nonconsecutive axioms are needed to improve the chances of finding a quantifier-free inductive invariant.

It is important to have enough prophecy variables to assist in constructing inductive invariants. We found that we could often obtain a larger, richer set of prophecy variables by weakening our array abstraction. We do this by replacing equality between arrays by an uninterpreted predicate, and also checking the congruence axiom, the converse of (ext). Since more axioms are checked, there are more opportunities to introduce auxiliary variables. We call this weak abstraction (**WA**) as opposed to strong abstraction (**SA**), which uses regular equality between abstract arrays and guarantees congruence through UF axioms.

On the other hand, an excessive number of unnecessary auxiliary variables could overwhelm the **Prove** step. Thus, an improvement not shown in Algorithm 3 is to check consecutive axioms first and only add nonconsecutive ones when necessary. This is the motivation behind the custom array solver implementation CheckArrayAxioms based on [13]. In principle, we could have used an SMT solver to find array axioms, but it would give no preference to consecutive axioms. Similarly, we could overwhelm the algorithm with unnecessary consecutive axioms. CheckArrayAxioms can still produce hundreds or even thousands of (consecutive) axiom instantiations. Once these are lifted to the transition system, some may be redundant. To mitigate this issue, when the BMC check returns ⊥ and we are about to return (line 4), we keep only axioms that appear in the unsat core of the BMC formula [22].

**Correctness.** We now state two important correctness theorems. Note that here and below, proofs are omitted due to space constraints. An extended version with proofs is available at: https://arxiv.org/abs/2101.06825.

**Theorem 4.** Algorithm 1, instantiated with *Abstract-Arrays*, a model-checker *Prove* as described above, and *Refine-Arrays* is sound.

**Theorem 5.** If Algorithm 1, instantiated with *Abstract-Arrays*, *Prove* as described above, and *Refine-Arrays*, returns false, there is a concrete counterexample of length k in the concrete transition system.

### **5 Expressiveness and Limitations**

We now address the expressiveness of counterexample-guided prophecy with regard to the introduction of auxiliary variables. For simplicity, we ignore the array abstraction, relying on the correctness theorems. An inductive invariant using auxiliary variables can be converted to one without auxiliary variables by first universally quantifying over the prophecy variables, then existentially quantifying over the history variables. The details are captured by this theorem:

**Theorem 6.** Let <sup>S</sup> := X, I, T be an STS, and <sup>P</sup>(X) be a property such that S |<sup>=</sup> <sup>P</sup>(X). Let <sup>H</sup> be the set of history variables, and <sup>P</sup> be the set of prophecy variables introduced by *Refine-Arrays*. Let <sup>S</sup>˜ := <sup>X</sup> <sup>∪</sup> <sup>H</sup> ∪ P,I, <sup>T</sup>˜ and P˜ := (- <sup>p</sup>∈P <sup>p</sup> <sup>=</sup> <sup>t</sup> ˜(p)) =<sup>⇒</sup> <sup>P</sup>(X) be the system and property with auxiliary variables. The function t ˜ maps prophecy variables to their target term from *Prophecize*. If Inv(X, H,P) is an inductive invariant for <sup>S</sup>˜ and entails <sup>P</sup>˜, then <sup>∃</sup>H∀PInv(X, H,P) is an inductive invariant for <sup>S</sup> and entails <sup>P</sup>, where <sup>∃</sup><sup>H</sup> and ∀P bind each variable in the set with the corresponding quantifier.

Although the invariants found using counterexample-guided prophecy correspond to ∃∀ invariants over the unmodified system, we must acknowledge that the existential power is very weak. The existential quantifier is only used to remove history variables. While history variables can certainly be employed for existential power in an invariant [55], these specific history variables are introduced solely to target a term for prophecy and only save a term for some fixed, finite number of steps. Thus, we do not expect to gain much existential power in finding invariants on practical problems. This use of history and prophecy variables can be thought of as quantifier instantiation at the model checking level, where the instantiation semantically uses a term appearing in an execution of the system. Consequently, our technique performs well on systems where there is only a small number of instantiations needed over terms that are not too distant in time from a potential property violation that must be disproved (i.e., not many history variables are required). This appears to be a common situation for invariant-finding benchmarks, as we show empirically in Sec. 6.

**Limitations.** If our CEGAR loop terminates, it either terminates with a proof or with a true counterexample. However, it is possible that the procedure may not terminate. In particular, while we can always refine the abstraction for a given bound k, there is no guarantee that this will eventually result in a refinement that rules out all spurious counterexamples (of any length).

This failure mode occurs, for instance, when no finite number of instantiations can capture all the relevant indices of the array. Consider an example system with <sup>I</sup> := <sup>a</sup> <sup>=</sup> constarr (<sup>0</sup> ), <sup>T</sup> := <sup>a</sup> <sup>=</sup> write(a, <sup>i</sup>0 , read(a, <sup>i</sup>1 ) + <sup>1</sup> ), and <sup>P</sup> := read(a, <sup>i</sup>r ) <sup>≥</sup> 0. The array <sup>a</sup> is initialized with 0 at every index, and at every step, a is updated at a single index by reading from an arbitrary index of a and adding 1 to the result. Note that the index variables are unconstrained: they can range over the integers freely at each time step. Then, the property is that every element of a is positive. This property clearly holds because of a quantified invariant maintained by the system: <sup>∀</sup>i . read(a, <sup>i</sup>) <sup>≥</sup> 0.

However, the initial abstraction is a memoryless array which can easily violate the property by returning negative values from reads. Since the array is updated in each step at an arbitrary index based on a read from another arbitrary index, no finite number of prophecy variables can capture all the relevant indices. It will successively rule out longer finite spurious counterexamples, but will never be refined enough to prove the property unboundedly. We believe that this limitation can be addressed in future work, perhaps by adapting techniques from [52]. However, it is not yet clear how to automate that process. Note that an even simpler system which does not add 1 in the update would already be problematic; however, for that case, it is straightforward to extend our algorithm to have it learn that the array does not change.

A related, but less fundamental issue is that the index set might not contain the best choice of targets for prophecy. While the index set is sufficient for ruling out bounded counterexamples, it is possible there is a better target for universal prophecy that does not appear in the index set. However, based on the evaluation in Sec. 6, it appears that the index set does work well in practice.

### **6 Experiments**

**Implementation.** In this section, we evaluate a prototype implementation of counterexample-guided prophecy, which instantiates **Prove** with ic3ia [34] (downloaded Apr 27, 2020), an open-source C++ implementation of IC3 via Implicit Predicate Abstraction (IC3IA) [20], which is itself a CEGAR loop that uses implicit predicate abstraction to perform IC3 [12] on infinite-state systems and uses interpolants to find new predicates. ic3ia uses MathSAT [21] (version 5.6.3) as the backend SMT solver and interpolant producer. We call our prototype prophic3 [48]. In our implementation, we also include a simple abstractionrefinement wrapper which abstracts large constant integers and refines them with the actual values if that fails. This is especially useful for dealing with software benchmarks with large constant loop bounds. Otherwise, the system might need to be unrolled to a very large bound to reach an abstract counterexample.

**Setup.** We evaluate our tool against three state-of-the-art tools for inferring universally quantified invariants over linear arithmetic and arrays: freqhorn, quic3, and gspacer. All these tools are Constrained Horn Clause (CHC) solvers built on Z3 [54]. The algorithm implemented in freqhorn [28] is a syntax-guided synthesis [4] approach for inferring universally quantified invariants over arrays [29]. quic3 is built on Spacer [40], the default CHC engine in Z3, and extends IC3 over linear arithmetic and arrays to allow universally quantified frames (frames are candidates for inductive invariants maintained by the IC3 algorithm). It also maintains a set of quantifier instantiations which are provided to the underlying SMT solver. quic3 was recently incorporated into Z3. We used Z3 version 4.8.9 with parameters suggested by the quic3 authors.<sup>4</sup> Finally, gspacer is an extension of Spacer which adds three new inference rules for improving local generalizations with global guidance. While this last technique does not specifically target universally quantified invariants, it can be used along with the quic3 options in Spacer and potentially executes a much different search. The gspacer

<sup>4</sup> fp.spacer.q3.use qgen=true fp.spacer.ground pobs=false fp.spacer.mbqi=false fp.spacer.use euf gen=true


**Fig. 3:** Experimental results. The safe results are reported as *# Q* / *# QF*. The second column per group shows unsafe results, the first two groups had only safe benchmarks.

submission [43] won the arrays category in CHC-COMP 2020 [58]. We also include ic3ia and the default configuration of Spacer in our results, neither of which can produce universally quantified invariants. Our default configuration of prophic3 uses weak abstraction, but we also include a version running strong abstraction (prophic3-SA) in our experiments. We chose to build our prototype on ic3ia instead of Spacer, in part because we needed uninterpreted functions for our array abstraction, and Spacer does not handle them in a straightforward way, due to the semantics of CHC [11].

We compare these solvers on four benchmark sets: i) freqhorn - benchmarks from the freqhorn paper [29]; ii) quic3 - benchmarks from the quic3 paper [37] (these were C programs from SV-COMP [8] that were modified to require universally quantified invariants); iii) vizel - additional benchmarks provided to us by the authors of [37]; and iv) chc-comp-2020 - the array category benchmarks of CHC-COMP 2020 [57]. Additionally, we sort the benchmarks into three categories: 1) Q - safe benchmarks solved by some tool supporting quantified invariants but none of the solvers that do not; 2) QF - those solved by at least one of the tools that do not support quantified invariants, plus any unsafe benchmarks; and 3) U - unsolved benchmarks. Because not all of the benchmark sets were guaranteed to require quantifiers, this is an approximation of which benchmarks required quantified reasoning to prove safe.

Both prophic3 and ic3ia take a transition system and property specified in the Verification Modulo Theories (VMT) format [23], which is a transition system format built on SMT-LIB [6]. All other solvers read the CHC format. We translated benchmark sets i and iv from CHC to VMT using the horn2vmt program which is distributed with ic3ia. For benchmark sets ii and iii, we started with the C programs and generated both VMT and CHC using Kratos2 (an updated version of Kratos [19]). We ran all experiments on a 3.5GHz Intel Xeon E5-2637 v4 CPU with a timeout of 2 hours and a memory limit of 32Gb. An artifact for reproducing these results is publicly available [49,38].

**Results.** The results are shown in Fig. 3. We first observe that prophic3 solves the most benchmarks in each of the first three sets, both overall and in category Q. The quic3 (and most of the freqhorn) benchmarks require quantified invariants; thus, ic3ia and Spacer cannot solve any of them. On solved instances in the Q category, prophic3 introduced an average of 1.2 prophecy variables and a median of 1. This makes sense because, upon inspection, most benchmarks only require one quantifier and we are careful to only introduce prophecy variables when needed. On benchmarks it cannot solve, ic3ia either times out or fails to compute an interpolant. This is expected because quantifier-free interpolants are not guaranteed over the standard theory of arrays. Even without arrays, it is also possible for prophic3 to fail to compute an interpolant, because MathSAT's interpolation procedure is incomplete for combinations with non-convex theories such as integers. However, this was rarely observed in practice.

We also observe that prophic3-SA solves fewer benchmarks in the first three sets. However, it is faster on commonly solved instances. This makes sense because it needs to check fewer axioms (it uses built-in equality and thus does not check equality axioms). We suspect that it solves fewer benchmarks in the first three sets because it was unable to find the right prophecy variable. For example, for the standard find true-unreach-call ground benchmark in the quic3 set, a prophecy variable is needed to find a quantifier-free invariant. However, because of the stronger reasoning power of **SA**, the system can be sufficiently refined without introducing auxiliary variables. ic3ia is then unable to prove the property on the resulting system without the prophecy variable, instead timing out. Interestingly, notice that prophic3-SA solves the most benchmarks in the QF category overall, suggesting that there are practical performance benefits of the CEGAR approach even when quantified reasoning is not needed.

There was one discrepancy on the CHC-COMP 2020 benchmarks: gspacer disagrees with quic3, Spacer, and prophic3 on chc-LIA-lin-arrays 381. This is the same discrepancy mentioned in the CHC-COMP 2020 report [58]. prophic3 proved this benchmark safe without introducing any auxiliary variables and we used both CVC4 [5] and MathSAT to verify that the solution was indeed an inductive invariant for the concrete system. We are confident that this benchmark is safe and thus do not count it as a solved instance for gspacer.

Some of the tools are sensitive to the encoding. Since it is syntax-guided, freqhorn is sensitive to the encoding syntax. The freqhorn benchmarks were hand-written to be syntactically simple, an encoding which is also good for prophic3. However, prophic3 can be sensitive to other encodings. For example, the quic3 benchmarks are also included in the chc-comp-2020 set, but translated by SeaHorn [35] instead of Kratos2. prophic3 does much worse on the SeaHorn encoding (6 vs 42). We stress that the CHC solvers performed similarly on both encodings, so we did not compare against disadvantaged solvers. In fact, quic3 and freqhorn solved exactly the same number in both translations. However, gspacer solved fewer using the Kratos2 encoding (27 vs 34). Importantly, prophic3 on the Kratos2 encoding solved more benchmarks than any other tool and encoding pair.

There are two main reasons why prophic3 fails on the SeaHorn encodings. First, due to the LLVM-based encoding, some of the SeaHorn translations have index sets which are insufficient for finding the right prophecy variable. This has to do with the memory encoding and the way that fresh variables and guards are used. SeaHorn also splits memories into ranges which is problematic for our technique. Second, the SeaHorn translation is optimized for CHC, not for transition systems. For example, it introduces many new variables, and the argument order between different predicates may not match. In the transition system, this essentially has the effect of interchanging the values of variables between each loop. SeaHorn has options that address some of these issues, and these helped prophic3 solve more benchmarks, but none of these options produce encodings that work as well as the Kratos2 encodings. The difference between good CHC and transition system encodings could also explain the overall difference in performance on chc-comp-2020 benchmarks, most of which were translated by SeaHorn. Both of these issues are practical, not fundamental, and we believe they can be resolved with additional engineering effort.

### **7 Related Work**

There are two important related approaches for abstracting arrays in horn clauses [53] and memories in hardware [10]. Both make a similar observation that arrays can be abstracted by modifying the property to maintain values at only a finite set of symbolic indices. We differ from the former by using a refinement loop that automatically adjusts the precision and targets relevant indices. The latter is also a refinement loop that adjusts precision, but differs in the domain and the refinement approach, which uses a multiplexer tree. We differ from both approaches in our use of array axioms to find and add auxiliary variables.

A similar lazy array axiom instantiation technique is proposed in [15]. However, their technique utilizes interpolants for finding violated axioms and cannot infer universally quantified invariants. The work of [18] also uses lazy axiombased refinement, abstracting non-linear arithmetic with uninterpreted functions. We differ in the domain and the use of auxiliary variables. In [55], prophecy variables defined by temporal logic formulas are used for liveness and temporal proofs, with the primary goal of increasing the power of a temporal proof system. In contrast, we use prophecy variables here for a different purpose, and we also find them automatically. The work of [24] includes an approach for synthesizing auxiliary variables for modular verification of concurrent programs. Our approach differs significantly in the domain and details.

There is a substantial body of work on automated quantified invariant generation for arrays using first-order theorem provers [42,16,41,51]. These include extensions to saturation-based theorem proving to analyze specific kinds of predicates, and an extension to paramodulation-based theorem proving to produce universally quantified interpolants. In [46], the authors propose an abstract interpretation approach to synthesize universally quantified array invariants. Our method also uses abstraction, but in a CEGAR framework.

Two other notable approaches capable of proving properties over arrays that require invariants with alternating quantifiers are [30,56]. The former proposes trace logic for extending first-order theorem provers to software verification, and the latter takes a counterexample-guided inductive synthesis approach. Our approach takes a model checking perspective and differs significantly in the details.

While these approaches are more general, we compared against state-of-the-art tools that focus specifically on universally quantified invariants.

MCMT [31,33,25] and its derivatives [2,3] are backward-reachability algorithms for proving properties over "array-based systems," which are typically used to model parameterized protocols. These approaches target syntactically restricted functional transition systems with universally quantified properties, whereas our approach targets general transition systems. Two other approaches for solving parameterized systems modeled with arrays are [36] and [47]. The former iteratively fixes the number of expected universal quantifiers, then eagerly instantiates them and encodes the invariant search to nonlinear CHC. The latter first uses a finite-state model checker to discover an inductive invariant for a specific parameterization and then applies a heuristic generalization process. We differ from all these techniques in domain and the use of auxiliary variables. Due to the limitations explained in Sec. 5, we do not expect our approach to work well for parameterized protocol verification without improvements.

In [45], heuristics are proposed for finding predicates with free indices that can be universally quantified in a predicate abstraction-based inductive invariant search. Our approach is counterexample-guided and does not utilize predicate abstraction directly (although IC3IA does). The authors of [39] propose a technique for Java programs that associates heap memory with the program location where it was allocated and generates CHC verification conditions. This enables the discovery of invariants over all heap memory allocated at that location, which implicitly provides quantified invariants. This is similar to our approach in that it gives quantification power without explicitly using quantifiers and in that their encoding removes arrays. However, we differ in that we focus on transition systems and utilize a different paradigm to obtain this implicit quantification.

### **8 Conclusion**

We presented a novel approach for model checking transition systems containing arrays. We observed that history and prophecy variables can be extremely useful for reducing quantified invariants to quantifier-free invariants. We demonstrated that an initially weak abstraction in our CEGAR loop can help us to automatically introduce relevant auxiliary variables. Finally, we evaluated our approach on four sets of interesting array-manipulating benchmarks. In future work, we hope to improve performance, explore a tighter integration with the underlying model checker, address the limitations described in Sec. 5, and investigate applications of counterexample-guided prophecy to other theories.

**Acknowledgments.** This work was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1656518. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Additional support was provided by DARPA, under grant No. FA8650-18-2-7854. We thank these sponsors for their support. We would also like to thank Alessandro Cimatti for his invaluable feedback on the initial ideas of this paper.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# SAT Solving with GPU Accelerated Inprocessing

Muhammad Osama () <sup>1</sup> <sup>∗</sup> , Anton Wijs<sup>1</sup> † , and Armin Biere<sup>2</sup> ‡

<sup>1</sup> Eindhoven University of Technology, Eindhoven, The Netherlands <sup>2</sup> Johannes Kepler University, Linz, Austria o.m.m.muhammad@tue.nl a.j.wijs@tue.nl biere@jku.at

Abstract. Since 2013, the leading SAT solvers in the SAT competition all use inprocessing, which unlike preprocessing, interleaves search with simplifications. However, applying inprocessing frequently can still be a bottle neck, i.e., for hard or large formulas. In this work, we introduce the first attempt to parallelize inprocessing on GPU architectures. As memory is a scarce resource in GPUs, we present new space-efficient data structures and devise a data-parallel garbage collector. It runs in parallel on the GPU to reduce memory consumption and improves memory access locality. Our new parallel variable elimination algorithm is twice as fast as previous work. In experiments our new solver PARAFROST solves many benchmarks faster on the GPU than its sequential counterparts.

Keywords: Satisfiability · Variable Elimination · Eager Redundancy Elimination · Parallel SAT Inprocessing · Parallel Garbage Collection · GPU.

### 1 Introduction

During the past decade, SAT solving has been used extensively in many applications, such as combinational equivalence checking [27], automatic test pattern generation [33, 40], automatic theorem proving [14], and symbolic model checking [7,13]. Simplifying SAT problems prior to solving them has proven its effectiveness in modern conflictdriven clause learning (CDCL) SAT solvers [5, 6, 17], particularly when applied on real-world applications relevant to software and hardware verification [16, 20, 22, 24].

Since 2013, simplification techniques [8, 16, 19, 21, 41] are also used periodically *during* SAT solving, which is known as *inprocessing* [3–6, 23]. Applying inprocessing iteratively to large problems can be a performance bottleneck in SAT solving procedure, or even increase the size of the formula, negatively impacting the solving time.

Graphics processors (GPUs) have become attractive for general-purpose computing with the availability of the Compute Unified Device Architecture (CUDA) programming model. CUDA is widely used to accelerate applications that are computationally intensive w.r.t. data processing. For instance, we have applied GPUs to accelerate explicit-state model checking [11, 43], bisimilarity checking [42], the reconstruction of

c The Author(s) 2021

<sup>∗</sup> This work is part of the GEARS project with project number TOP2.16.044, which is (partly) financed by the Netherlands Organisation for Scientific Research (NWO).

<sup>†</sup> We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce Titan RTX used for this research.

<sup>‡</sup> Partially funded by the LIT AI Lab.

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 133–151, 2021. https://doi.org/10.1007/978-3-030-72016-2 8

genetic networks [12], wind turbine emulation [30], metaheuristic SAT solving [44], and SAT-based test generation [33]. Recently, we introduced SIGmA [34, 35] as the first SAT simplification preprocessor to exploit GPUs.

Contributions. Embedding GPU inprocessing in a SAT solver is highly non-trivial and has never been attempted before, according to the best of our knowledge. Efficient data structures are needed that allow parallel processing, and that support efficient adding and removing of clauses. For this purpose, we contribute the following:


In addition, we propose a new preprocessing technique targeted towards data-parallel execution, called *Eager Redundancy Elimination* (ERE), which is applicable on both original and learnt clauses. All contributions have been implemented in our solver PARAFROST and benchmarked on a larger set than considered previously in [34], using 493 application problems. We discuss the potential performance gain of the GPU inprocessing and its impact on SAT solving, compared to a sequential version of our solver as well as CADICAL [6], a state-of-the-art solver developed by the last author.

### 2 Preliminaries

All SAT formulas in this paper are in conjunctive normal form (CNF). A CNF formula is a conjunction of m clauses m <sup>i</sup>=1 <sup>C</sup>i, where each clause <sup>C</sup><sup>i</sup> is a disjunction of <sup>k</sup> literals <sup>k</sup> <sup>j</sup>=1 <sup>j</sup> , and a literal is a Boolean variable <sup>x</sup> or its complement <sup>¬</sup>x, which we refer to as <sup>x</sup>¯. We represent clauses by sets of literals, i.e., {-1,...,<sup>k</sup>} represents the formula -<sup>1</sup> <sup>∨</sup> ... <sup>∨</sup> <sup>k</sup>, and a SAT formula by a set of clauses, i.e., {C1,...,Cm} represents the formula <sup>C</sup><sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>C</sup>m. With <sup>S</sup>-, we refer to the set of clauses containing literal -, i.e., S- <sup>=</sup> {<sup>C</sup> ∈S| - <sup>∈</sup> <sup>C</sup>}. If for a variable <sup>x</sup>, we have either <sup>S</sup><sup>x</sup> <sup>=</sup> <sup>∅</sup> or <sup>S</sup><sup>x</sup>¯ <sup>=</sup> <sup>∅</sup> (but not both), then the literal x¯ or x, respectively, is called a *pure literal*. A clause C is a *tautology* iff there exists a variable <sup>x</sup> with {x, <sup>x</sup>¯} ⊆ <sup>C</sup>, and <sup>C</sup> is *unit* iff <sup>|</sup>C<sup>|</sup> = 1.

In this paper we integrate GPU-accelerated inprocessing and CDCL [28, 32, 36]. One important aspect of CDCL is to learn from previous assignments to prune the search space and make better decisions in the future. This learning process involves the periodic adding of new *learnt* clauses to the input formula while CDCL is running.

In this paper, clauses are either considered to be LEARNT or ORIGINAL (*redundant* and *irredundant* in [23] and in the SAT solver CADICAL [6]). A LEARNT clause is added to the formula by the CDCL clause learning process, and an ORIGINAL clause is part of the formula from the very start. Furthermore, each assignment is associated with a *decision level* that acts as a time stamp, to monitor the order in which assignments are performed. The first assignment is made at decision level one.

Variable Elimination (VE). Variables can be removed from clauses by either applying the *resolution rule* or *substitution* (also known as gate equivalence reasoning) [16, 23]. Concerning the former, we represent application of the resolution rule w.r.t. some variable <sup>x</sup> using a *resolving operator* <sup>⊗</sup><sup>x</sup> on clauses <sup>C</sup><sup>1</sup> and <sup>C</sup>2. The result of applying the rule is called the *resolvent* [41]. It is defined as <sup>C</sup><sup>1</sup> <sup>⊗</sup><sup>x</sup> <sup>C</sup><sup>2</sup> <sup>=</sup> <sup>C</sup><sup>1</sup> <sup>∪</sup> <sup>C</sup><sup>2</sup> \ {x, <sup>x</sup>¯}, and can be applied iff <sup>x</sup> <sup>∈</sup> <sup>C</sup>1, <sup>x</sup>¯ <sup>∈</sup> <sup>C</sup>2. The <sup>⊗</sup><sup>x</sup> operator can be extended to resolve sets of clauses w.r.t. variable <sup>x</sup>. For a formula <sup>S</sup>, let L⊂S be the set of learnt clauses when we apply the resolution rule. The set of new resolvents is then defined as <sup>R</sup>x(S) = {C<sup>1</sup> <sup>⊗</sup><sup>x</sup> <sup>C</sup><sup>2</sup> <sup>|</sup> <sup>C</sup><sup>1</sup> <sup>∈</sup> <sup>S</sup><sup>x</sup> \L∧ <sup>C</sup><sup>2</sup> <sup>∈</sup> <sup>S</sup><sup>x</sup>¯ \ L ∧ ¬∃y.{y, <sup>y</sup>¯} ⊆ <sup>C</sup><sup>1</sup> <sup>⊗</sup><sup>x</sup> <sup>C</sup>2}. Notice that the learnt clauses can be ignored [23] (i.e., in practice, it is not effective to apply resolution on learnt clauses). The last condition avoids that a resolvent should not be a tautology. After eliminating variable <sup>x</sup> in <sup>S</sup>, the resulting formula <sup>S</sup> is defined as <sup>S</sup> <sup>=</sup> <sup>R</sup>x(S) <sup>∪</sup> (S \ (S<sup>x</sup> ∪ S<sup>x</sup>¯)), i.e., the new resolvents are combined with the original and learnt clauses that do not reference x.

Substitution detects patterns encoding logical gates, and substitutes the involved variables with their gate-equivalent counterparts. Previously [34], we only considered AND gates. In the current work, we add support for *Inverter*, *If-Then-Else* and *XOR* gate extractions. For all logical gates, substitution can be performed by resolving non-gate clauses (i.e., clauses not contributing to the gate itself) with gate clauses [23].

For instance, the first three clauses in the formula {{x, a, ¯ ¯b}, {x, a ¯ }, {x, b ¯ }, {x, c}} together encode a logical AND-gate, hence the final clause can be resolved with the second and the third clauses, producing the simplified formula {{a, c}, {b, c}}. Combining gate equivalence reasoning with the resolution rule tends to result in smaller formulas compared to only applying the resolution rule [16, 23, 37].

Subsumption elimination. SUB performs *self-subsuming resolution* followed by *subsumption elimination* [16]. The former can be applied on clauses C1, C<sup>2</sup> iff for some variable x, we have C<sup>1</sup> = C <sup>1</sup> ∪ {x}, <sup>C</sup><sup>2</sup> <sup>=</sup> <sup>C</sup> <sup>2</sup> ∪ {x¯}, and <sup>C</sup> <sup>2</sup> <sup>⊆</sup> <sup>C</sup> <sup>1</sup>. In that case, x can be removed from <sup>C</sup>1. The latter is applied on clauses <sup>C</sup>1, C<sup>2</sup> with <sup>C</sup><sup>2</sup> <sup>⊆</sup> <sup>C</sup>1. In that case, C<sup>1</sup> is redundant and can be removed. If C<sup>2</sup> is a LEARNT clause, it must be considered as ORIGINAL in the future, to prevent deleting it during learnt clause reduction, a procedure which attempts to reduce the number of learnt clauses [6, 23]. For instance, consider the formula <sup>S</sup> <sup>=</sup> {{a, b, c}, {a, b ¯ }, {b, c, d}}. The first clause is self-subsumed by the second clause w.r.t. variable <sup>a</sup> and can be strengthened to {b, c} which in turn subsumes the last clause {b, c, d}. The latter clause is then removed from <sup>S</sup> and the simplified formula becomes {{b, c}, {a, b ¯ }}.

Blocked clause elimination. BCE [25] can remove clauses for which variable elimination always results in tautologies. Consider the formula {{a, b, c}, {a, ¯ ¯b}, {a, ¯ <sup>c</sup>¯}}. All three literals a, b and c are blocking the first clause, since resolving a produces the tautologies {{b, c, ¯b}, {b, c, <sup>c</sup>¯}}, resolving <sup>b</sup> produces {a, a, c ¯ }, and resolving <sup>c</sup> produces {a, ¯ a, b}. Hence the blocked clause {a, b, c} can be removed from <sup>S</sup>. Again, as for VE, only original clauses are considered.

Eager Redundancy Elimination. ERE is a new elimination technique that we propose, which repeats the following until a fixpoint has been reached: for a given formula S and clauses <sup>C</sup><sup>1</sup> ∈ S, C<sup>2</sup> ∈ S with <sup>x</sup> <sup>∈</sup> <sup>C</sup><sup>1</sup> and <sup>x</sup>¯ <sup>∈</sup> <sup>C</sup><sup>2</sup> for some variable <sup>x</sup>, if there exists a clause <sup>C</sup> ∈ S for which <sup>C</sup> <sup>≡</sup> <sup>C</sup><sup>1</sup> <sup>⊗</sup><sup>x</sup> <sup>C</sup>2, then let <sup>S</sup> := S\{C}. In this work, we restrict removing <sup>C</sup> to the condition (C<sup>1</sup> is LEARNT <sup>∨</sup> <sup>C</sup><sup>2</sup> is LEARNT) =<sup>⇒</sup> <sup>C</sup> is LEARNT.

If the condition holds, C is called a *redundancy* and can be removed without altering the original satisfiability. For example, consider <sup>S</sup> <sup>=</sup> {{a, <sup>c</sup>¯}, {c, b}, { ¯ d, <sup>c</sup>¯}, {b, a}, {a, d}}. Resolving the first two clauses gives the resolvent {a, b} which is equivalent to the fourth clause in <sup>S</sup>. Also, resolving the third clause with the last clause yields {a, <sup>c</sup>¯} which is equivalent to the first clause in <sup>S</sup>. ERE can remove either {a, <sup>c</sup>¯} or {a, b} but not both. Note that this method is entirely different from *Asymmetric Tautology Elimination* in [21]. The latter requires adding so-called hidden literals to all clauses to check which is a hidden tautology. ERE can operate on learnt clauses and does not require literals addition, making it more effective and adequate to data parallelism.

### 3 GPU Memory and Data Structures

GPU Architecture. Since 2007, NVIDIA has been developing a parallel computing platform called CUDA [31] that allows developers to use GPU resources for general purpose processing. A GPU contains multiple streaming multiprocessors (SMs), each SM consisting of an array of streaming processors (SPs). Every SM can execute multiple threads grouped together in 32-thread scheduling units called *warps*.

A GPU computation can be launched in a program by the *host* (CPU side of a program) by calling a GPU function called a *kernel*, which is executed by the *device* (GPU side of a program). When a kernel is called, it is specified how many threads need to execute it. These threads are partitioned into thread *blocks* of up to 1,024 threads (or 32 warps). Each block is assigned to an SM. All threads together form a *grid*. A hardware warp scheduler evenly distributes the launched blocks to the available SMs. Concerning the memory hierarchy, a GPU has multiple types of memory:


To hide the latency of global memory, ensuring that the threads perform *coalesced accesses* is one of the best practices. When the threads in a warp try to access a consecutive block of 32-bit words, their accesses are combined into a single (coalesced) memory access. Uncoalesced memory accesses can, for instance, be caused by data sparsity or misalignment. Furthermore, we use *unified memory* [31] to store the main data structures that need to be regularly accessed by both the CPU and the GPU. Unified memory creates a pool of managed memory that is shared between the CPU and GPU. This pool is accessible to both sides using the same addresses. Regarding atomicity, a GPU can run *atomic* instructions on both global and shared memory. Such an instruction performs a *read-modify-write* memory operation on one 32-bit or 64-bit word.

```
(a) container for a clause (b) container for a formula
```
Fig. 1: Data structures to store a SAT formula on a GPU

Data Structures. To efficiently implement inprocessing techniques for GPU architectures, we designed a new data structure from scratch to count the number of learnt clauses, and store other relevant clause information, while keeping the memory consumption as low as possible. Fig. 1 shows the proposed structures to store a clause (denoted by SCLAUSE) and the SAT formula represented in CNF form (denoted by CNF). The state member in Fig. 1a stores the current *clause state*. A clause is either ORIGINAL, LEARNT (see Section 2) or DELETED. A GPU thread is not allowed to deallocate memory, however, a clause can be set to DELETED and freed later during garbage collection. The members added and flag mark the clause for being resolvent (when applying the resolution rule) and contributing to a gate (for substitution), respectively. The lbd entry denotes the *literal block distance* (LBD), i.e., the number of decision levels contributing to a conflict [2]. The used counter is used to keep track of how long a LEARNT clause should be used before it gets deleted during database reduction [6,38]. Both used and lbd can be altered via clause *strengthening* [6] in SUB.

The signature (sig) of a clause is computed by hashing its literals to a 32-bit value [16]. It is used to quickly compare clauses. The first literal in a clause is preallocated and stored in the fixed array literals[1]. As has been done for the MINISAT solver, we adapted the union structure to allow dynamically expanding the literals array. This is accepted by NVIDIA's compiler (NVCC). In our previous work [34], we stored a pointer in each clause referencing the first literal, with the literals being in a separate array. This consumes 8 bytes of the clause space. However, SCLAUSE only needs 4 bytes for the literals array, resulting in the clause occupying 20 bytes in total, including the extra information of the learnt clause, compared to 24 bytes in our previous work.

As implemented in MINISAT, we use the clauses field in CNF (Fig. 1b) to store the raw bytes of SCLAUSE instances with any extra literals in 4-byte buckets with 64 bit reference support. The cap variable indicates the total memory capacity available for the storage of clauses, and size reflects the current size of the list of clauses. We always have size ≤ cap. The references field is used to directly access the clauses by saving for each clause a reference to their first bucket. The mechanism for storing references works in the same way as for clauses.

In addition, in a similar way, an *occurrence table* structure, denoted by OT, is created which has a raw pointer to store the 64-bit clause references for each literal in the formula and a member structure OL. The creation of an OL instance is done in parallel on the GPU for each literal using atomic instructions. For each clause C, a thread is launched to insert the occurrences of C's literals in the associated lists.

Initially, we pre-allocate unified memory for clauses and references which is in size twice as large as the input formula, to guarantee enough space for the original and learnt clauses. This amount is guaranteed to be enough as we enforce that the number of resolvents never exceeds the number of ORIGINAL clauses. The OT memory is reallocated dynamically if needed after each variable elimination. Furthermore, we check the amount of free available GPU memory before allocation is done. If no memory is available, the inprocessing step is skipped and the solving continues on the CPU.

### 4 Parallel Garbage Collection

Modern sequential SAT solvers implement a *garbage collection* (GC) algorithm to reduce memory consumption and maintain data locality [2, 6, 17].

Since GPU global memory is a scarce resource and coalesced accesses are essential to hide the latency of global memory (see Section 2), we decided to develop an efficient and parallel GC algorithm for the GPU without adding overhead to the GPU computations.

Fig. 2 demonstrates the proposed approach for a simple SAT formula <sup>S</sup> <sup>=</sup> {{a, ¯b, c}, {a, b, <sup>c</sup>¯}, {d, ¯b}, { ¯ d, b}}, in which {a, b, <sup>c</sup>¯} is to be deleted. The figure shows, in addition, how the references and clauses lists in Fig. 1b are updated for the given formula. The reference for each clause C is calculated based on the sum of the sizes (in buckets) of all clauses preceding C in the list of clauses. For example, the first clause (C1) requires <sup>α</sup> + (<sup>k</sup> <sup>−</sup> 1) = 5 + 2 = 7 buckets, where the constant α is the number of buckets needed to store SCLAUSE, in our case 20 bytes / 4 bytes, and k is the clause size in terms of the number of literals. Given the number of buckets needed for C1, the next clause (C2) must be stored starting from position 7

Fig. 2: An example of parallel GC on a GPU

in the list of clauses. This position plus the size of C<sup>2</sup> determines in a similar way the starting position for C3, and so on.

The first step towards compacting the CNF instance when C<sup>2</sup> is to be deleted is to compute a *stencil* and a list of corresponding clause sizes in terms of numbers of buckets. In this step, each clause C<sup>i</sup> is inspected by a different thread that writes a '0'

#### Algorithm 1: Parallel Garbage Collection

```
Input : global Sin, stencil, buckets, constant α , shared shCls, shLits
  Output: numCls, numLits
1 numCls, numLits ← COUNTSURVIVED(Sin);
2
  Sout ← ALLOCATE(numCls, numLits);
3 stencil, buckets ← COMPUTESTENCIL(Sin);
4 buckets ← EXCLUSIVESCAN(buckets);
5 references(Sout) ← COMPACTREFS(buckets, stencil);
6 COPYCLAUSES(Sout, Sin, buckets, stencil);
7 kernel COUNTSURVIVED (Sin):
8 register rCls ← 0, rLits ← 0;
9 for all i ∈ -
               0, |Sin|  in parallel
10 register C ← Sin[i];
11 if state(C) = DELETED then
12 rCls ← rCls + 1, rLits ← rLits + |C|;
13 if tid < |Sin| then
14 shCls[tid] = rCls, shLits[tid] = rLits;
15 else
16 shCls[tid]=0, shLits[tid]=0;
17 SYNCTHREADS( );
18 for b : blockDim/2, b/2 → 1 do // b will be blockDim/2, (blockDim/2)/2, ..., 1
19 if tid < b then
20 shCls[tid] ← shCls[tid] + shCls[tid + b], shLits[tid] ← shLits[tid] + shLits[tid + b];
21 SYNCTHREADS( );
22 if tid = 0 then
23 ATOMICADD(numCls, shCls[tid]), ATOMICADD(numLits, shLits[tid]);
24 kernel COMPUTESTENCIL (Sin):
25 for all i ∈ -
               0, |Sin|  in parallel
26 register C ← Sin[i];
27 if state(C) = DELETED then
28 stencil[i] ← 0 , buckets[i] ← 0;
29 else
30 stencil[i] ← 1 , buckets[i] ← α + (|C| − 1);
31 kernel COPYCLAUSES (Sout, Sin, buckets, stencil):
32 for all i ∈ -
               0, |Sin|  in parallel
33 if stencil[i] then
34 register & Cdest ← (SCLAUSE &)(clauses(Sout) + buckets[i]);
35 Cdest ← Sin[i];
                       in ] ] ] = 0] = 0/2, /2)/2, ] shLits] = ) + |
```
at position i of a list named stencil if the clause must be deleted, and a '1' otherwise. The size of stencil is equal to the number of clauses. In a list of the same size called buckets, the thread writes at position i '0' if the clause will be deleted, and otherwise the size of the clause in terms of the number of buckets.

At step 2, a parallel *exclusive-segmented scan* operation is applied on the buckets array to compute the new references. In this scan, the value stored at position i, masked by the corresponding stencil, is the sum of the values stored at positions 0 up to, but not including, i. An optimised GPU implementation of this operation is available via the CUDA CUB library [29], which transforms a list of size n in log(n) iterations. In the example, this results in C<sup>3</sup> being assigned reference 7, thereby replacing C2.

At step 3, the stencil list is used to update references in parallel, which are be kept together in consecutive positions. The standard DeviceSelect::Flagged function of the CUB library can be used for this, which uses stream compaction [10]. Finally, the actual clauses are copied to their new locations in clauses.

Alg. 1 describes in detail the GPU implementation of the parallel GC. As input, Alg. <sup>1</sup> requires a SAT formula <sup>S</sup>in as an instance of CNF. The constant <sup>α</sup> is kept in GPU constant memory for fast access. The highlighted lines in grey are executed on GPU. To begin GC, we count the number of clauses and literals in the Sin formula after simplification has been applied (line 1). The counting is done via the parallel reduction kernel COUNTSURVIVED, listed at lines 7-23. In kernels, we use two conventions. First of all, with *tid*, we refer to the *block-local* ID of the executing thread. By using this ID, we can achieve that different threads in the same block work on different data, as for instance at lines 13-16. Second of all, we use so-called *grid-stride loops* to process data elements in parallel. An example of this starts at line 9. The statement for all <sup>i</sup> <sup>∈</sup> -<sup>0</sup>, N in parallel expresses that all natural numbers in the range [0, N) must be considered in the loop, and that this is done in parallel by having each executing thread start with element *tid*, i.e., i = *tid*, and before starting each additional iteration through the loop, the thread adds to i the total number of threads on the GPU. If the updated i is smaller than N, the next iteration is performed with this updated i. Otherwise, the thread exits the loop. A grid-stride loop ensures that when the range of numbers to consider is larger than the number of threads, all numbers are still processed.

The values *rCls* and *rLits* at line 8 will hold the current number of clauses and literals, respectively, counted by the executing thread. The register keyword indicates that the variables are stored in the thread-local register memory. Within the loop at lines 9-12, the counters *rCls*, *rLits* are updated incrementally if the clause at position i in clauses is not deleted. Once a thread has checked all its assigned clauses, it stores the counter values in the (block-local) shared memory arrays (*shCls*, *shLits*) at lines 13-14.

A non-participating thread simply writes zeros (line 16). Next, all threads in the block are synchronised by the SYNCTHREADS call. The loop at lines 18-21 performs the actual parallel reduction to accumulate the number of non-deleted clauses and literals in shared memory within thread blocks. In the for loop, b is initially set to the number of threads in the block (*blockDim*), and in each iteration, this value is divided by 2 until it is equal to 1 (note that blocks always consist of a power of two number of threads).

The total number of clauses and threads is in the end stored by thread 0, and this thread adds those numbers using atomic instructions to the globally stored counters *numCls* and *numLits* at line 23, resulting in the final output. In the procedure described here, we prevent having each thread perform atomic instructions on the global memory, by which we avoid a potential performance bottleneck. The computed numbers are used to allocate enough memory for the output formula at line 2 on the CPU side.

The kernel COMPUTESTENCIL, called at line 3, is responsible for checking clause states and computing the number of buckets for each clause. The COMPUTESTENCIL kernel is given at lines 24-30. If a clause C is set to DELETED (line 27), the corresponding entries in stencil and buckets are cleared at line 28, otherwise the stencil entry is set to 1 and the buckets entry is updated with the number of clause buckets.

The EXCLUSIVESCAN routine at line 4 calculates the new references to store the remaining clauses based on the collected buckets. For that, we use the exclusive scan method offered by the CUB library. The COMPACTREFS routine called at line 5 groups the *valid* references, i.e., those flagged by stencil, into consecutive values and stores them in references(Sout), which refers to the references field of the output formula Sout. Finally, copying clause contents (literals, state, etc.) is done in the COPY-CLAUSES kernel, called at line 6. This kernel is described at lines 31-35. If a clause in Sin is flagged by stencil via thread *i*, then a new SCLAUSE reference is created in clauses(Sout), which refers to the clauses field in Sout, offset by buckets[*i*].

The GC mechanism described above resulted from experimenting with several less efficient mechanisms first. In the first attempt, two atomic additions per thread were performed for each clause, one to move the non-deleted clause buckets and the other for moving the corresponding reference. However, the excessive use of atomics resulted in a performance bottleneck and produced a different simplified formula on each run, that is, the order in which the new clauses were stored depended on the outcome of the atomic instructions. The second attempt was to maintain stability by moving the GC to the host side. However, accessing unified memory on the host side results in a performance penalty, as it implicitly results in copying data to the host side.

### 5 Parallel Inprocessing Procedure

To exploit parallelism in simplifications, each elimination method is applied on multiple variables simultaneously. Doing so is non-trivial, since variables may *depend* on each other; two variables x and y are dependent iff there exists a clause C with (<sup>x</sup> <sup>∈</sup> <sup>C</sup> <sup>∨</sup> <sup>x</sup>¯ <sup>∈</sup> <sup>C</sup>) <sup>∧</sup> (<sup>y</sup> <sup>∈</sup> <sup>C</sup> <sup>∨</sup> <sup>y</sup>¯ <sup>∈</sup> <sup>C</sup>). If both <sup>x</sup> and <sup>y</sup> were to be processed for simplification, two threads might manipulate C at the same time. To guarantee soundness of the parallel simplifications, we apply our *least constrained variable elections* algorithm (LCVE) [34] prior to simplification. It is responsible for electing a set of mutually independent variables (candidates) from a set of authorised candidates. The remaining variables relying on the elected ones are frozen. These notions are defined by Defs. 1-4.

Definition 1 (Authorised candidates). *Given a CNF formula* S*, we call* A *the set of* authorised candidates*:* <sup>A</sup> <sup>=</sup> {<sup>x</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>h</sup>[x] <sup>≤</sup> <sup>μ</sup> <sup>∨</sup> <sup>1</sup> <sup>≤</sup> <sup>h</sup>[¯x] <sup>≤</sup> <sup>μ</sup>}*, where*


Definition 2 (Candidate Dependency Relation). *We call a relation* D : A×A *a* candidate dependency relation *iff* <sup>∀</sup>x, y ∈ A*,* <sup>x</sup> <sup>D</sup> <sup>y</sup> *implies that* <sup>∃</sup><sup>C</sup> ∈ S.(<sup>x</sup> <sup>∈</sup> <sup>C</sup> <sup>∨</sup> <sup>x</sup>¯ <sup>∈</sup> <sup>C</sup>) <sup>∧</sup> (<sup>y</sup> <sup>∈</sup> <sup>C</sup> <sup>∨</sup> <sup>y</sup>¯ <sup>∈</sup> <sup>C</sup>)

Definition 3 (Elected candidates). *Given a set of authorised candidates* A*, we call a set* <sup>ϕ</sup> ⊆ A *a set of* elected candidates *iff* <sup>∀</sup>x, y <sup>∈</sup> ϕ. <sup>¬</sup>(<sup>x</sup> <sup>D</sup> <sup>y</sup>)

Definition 4 (Frozen candidates). *Given the sets* <sup>A</sup> *and* <sup>ϕ</sup>*, the set of* frozen candidates F⊆A *is defined as* <sup>F</sup> <sup>=</sup> {<sup>x</sup> <sup>|</sup> <sup>x</sup> ∈A∧∃<sup>y</sup> <sup>∈</sup> ϕ. x <sup>D</sup> <sup>y</sup>}

A top-level description of GPU parallel inprocessing is shown in Alg. 2. The bluecolored lines highlight new contributions of the current work compared to our preprocessing algorithm presented in [34]. As input, it takes the current formula S<sup>h</sup> from the solver (executed on the host) and copies it to the device global memory as S<sup>d</sup> (line 1).

Initially, before simplification, we compute the clause signatures and order variables via concurrent streams at lines 2-3. A stream is a sequence of instructions that are executed in issue-order on the GPU [31]. The use of concurrent streams allows the running

#### Algorithm 2: Parallel Inprocessing


of multiple GPU kernels concurrently, if there are enough resources. The ORDERVARI-ABLES routine produces an ordered array of authorised candidates A following Def. 1. The while loop at lines 4-16 applies VE, SUB, and BCE, for a configured number of iterations (indicated by *phases*), with increasingly large values of the threshold μ. Increasing μ exponentially allows LCVE to elect additional variables in the next elimination phase since after a phase is executed on the GPU, many elected variables are eliminated. The ERE method is computationally expensive. Therefore, it is only executed once in the final iteration, at line 10. At line 5, SYNCALL is called to synchronize all streams being executed. At line 6, the occurrence table T is created. The LCVE routine produces on the host side an array of elected mutually independent variables ϕ, in line with Def. 3.

The parallel creation of the occurrence lists in T results in the order of these lists being chosen non-deterministically. This results in the ELIMINATE procedure called at line 13, which performs the parallel simplifications, to produce results non-deterministically as well. To remedy this effect, the lists in T are sorted according to a unique key in ascending order. Besides the benefit of stability, this allows SUB to abort early when performing subsumption checks. The sorting key function is given as the device function LISTKEY at lines 17-24. It takes two references a, b and fetches the corresponding clauses <sup>C</sup>a, C<sup>b</sup> from <sup>S</sup><sup>d</sup> (line 18). First, clause sizes are tested at line 19. If they are equal, the first, the second, and the last literal in each clause are checked, respectively, at lines 20-22. Otherwise, clause signatures are tested at line 23. CADICAL implements a similar function, but only considers clause sizes [6]. The SORTOT routine launches a kernel to sort the lists pointed to by the variables in ϕ in parallel. Each thread runs an insertion sort to in-place swap clause references using LISTKEY.

The ELIMINATE procedure at line 13 calls SUB to remove any subsumed clauses or strengthen clauses if possible, after which VE is applied, followed by BCE. The SUB and BCE methods call kernels that scan the occurrence lists of all variables in ϕ in parallel. For more information on this, see [34]. The VE method uses a new parallel approach, which is explained in Section 6. Both the VE and SUB methods may add new unit clauses atomically to a separate array Ud. The propagation of these units cannot be done immediately on the GPU due to possible data races, as multiple variables in a clause may occur in unit clauses. For instance, if we have unit clauses {a} and {b}, and these would be processed by different threads, then a clause {a, ¯ ¯b, c} could be updated by both threads simultaneously. Thus, this propagation is delayed until the next iteration, and performed by the host at line 7. Note that T must be recreated first to consider all resolvents added by VE during the previous phase. The ERE method at line 10 is executed only once at the last phase (*phases*) before the loop is terminated. Section 7 explains in detail how ERE can be effective in simplifying both ORIGINAL and LEARNT clauses in parallel. At line 14, new units are copied from the device to the host array U<sup>h</sup> asynchronously via *stream1*. The COLLECT procedure does the GC as described by Alg. 4 via *stream2*. Both streams are synchronized at line 5.

### 6 Three-Phase Parallel Variable Elimination

The BVIPE algorithm in our previous work [34] had a main shortcoming due to the heavy use of atomic operations to add new resolvents. Per eliminated variable, two atomic instructions were performed, one for adding new clauses and the other for adding new literals. Besides performance degradation, this also resulted in the order of added clauses being chosen non-deterministically, which impacted reproducibility (even though the produced formula would always at least be logically the same).

The approach to avoiding the excessive use of atomic instructions when adding new resolvents is to perform parallel VE in *three phases*. The first phase scans the constructed list ϕ to identify the elimination type (e.g., resolution or gate substitution) of each variable and to calculate the number of resolvents and their corresponding buckets.

The second phase computes an exclusive scan to determine the new references for adding resolvents, as is done in our GC mechanism (Section 4). At the last phase, we store the actual resolvents in their new locations in the simplified formula. For solution reconstruction, we use an atomic addition to count the resolved literals. The order in which they are resolved is irrelevant. The same is done for adding units. For the latter, experiments show that the number of added units is relatively small compared to the eliminated variables, hence the penalty of using atomic instructions is almost negligible. It would be overkill to use a segmented scan for adding literals or units.

At line 1 of Alg. 3, phase 1 is executed by the VARIABLESWEEP kernel (given at lines 15-27). Every thread scans the clause set of its designated literals x and x¯ (line 17). References to these clauses are stored at <sup>T</sup><sup>x</sup> and <sup>T</sup><sup>x</sup>¯. Moreover, register variables t, β, γ are created to hold the current *type*, number of *added clauses*, and number of *added literals* of x, respectively. If x is *pure* at line 19, then there are no resolvents to add and the clause sets of x and x¯ are directly marked as DELETED by the routine TOBLIVION. Moreover, this routine adds the marked literals atomically to resolved. At line 22, we

### Algorithm 3: Three-Phase Parallel Variable Elimination

```
Input : global ϕ, Sd, T , Ud, resolved, type, buckets, added, constant α
1 resolved, type, buckets, added ← VARIABLESWEEP(ϕ, Sd, T );
2
  lastadded ← −1, lastidx ← −1, lastcref ← −1, lastC ← ∅;
3 for j : |ϕ| − 1, j − 1 → 0 do // find index and # resolvents of last eliminated x
4 if type[j] = 0 then
5 lastidx ← j, lastadded ← added[j]; break;
6 buckets ← EXCLUSIVESCAN (buckets, SIZE(clauses), stream0);
7 added ← EXCLUSIVESCAN (added, SIZE(references), stream1);
8
  SYNCALL( );
9 numCls ← lastadded + added[lastidx];
10 lastcref ← references[numCls − 1], lastC ← clauses[lastcref];
11 numBuckets ← lastcref + (α + SIZE(lastC ) − 1);
12 RESIZE(clauses, numBuckets), RESIZE(references, numCls);
13 Sd, Ud ← VARIABLERESOLVENT(ϕ, Sd, T , type, buckets, added);
14
15 kernel VARIABLESWEEP (ϕ, Sd, T ):
16 for all i ∈ -
              0, |ϕ|  in parallel
17 register x ← ϕ[i], Tx ← T [x], Tx¯ ← T [x], t ← NONE, β ← 0, γ ← 0;
18 type[i] ← 0, buckets[i] ← 0, added[i] ← 0 ; // initially reset
19 if Tx = ∅∨Tx¯ = ∅ then // check if x is a pure literal
20 resolved ← TOBLIVION(x, Sd, Tx, Tx¯);
21 else
22 t, β, γ ← GATEREASONING (x, Sd, Tx, Tx¯, σ);
23 if t = GATE then
24 t, β, γ ← MAYRESOLVE (x, Sd, Tx, Tx¯) ; // t may set to RESOLUTION
25 if t = 0 then // x can be eliminated
26 type[i] ← t, added[i] ← β, buckets[i] ← α × β + (γ − β);
27 resolved ← TOBLIVION(x, Sd, Tx, Tx¯);
28 kernel VARIABLERESOLVENT (ϕ, Sd, T , type, buckets, added):
29 for all i ∈ -
              0, |ϕ|  in parallel
30 register x ← ϕ[i], Tx ← T [x], Tx¯ ← T [x];
31 register t ← type[i], cref ← buckets[i], rpos = added[i];
32 if t = RESOLUTION then
33 (Sd, Ud) ← (Sd, Ud) ∪ RESOLVE(x, Sd, Tx, Tx¯, rpos, cref);
34 if t = GATE then
35 (Sd, Ud) ← (Sd, Ud) ∪ SUBSTITUTE(x, Sd, Tx, Tx¯, rpos, cref);
                                          , , , // initially // a t, β, , t, β, may + buckets,
```
check first if x contributes to a logical gate using the routine GATEREASONING, and save the corresponding β and γ. If this is the case, the type t is set to GATE, otherwise we try resolution at line 24. The condition <sup>β</sup> <sup>≤</sup> (|Tx<sup>|</sup> <sup>+</sup> |T<sup>x</sup>¯|) is tested implicitly by MAYRESOLVE to limit the number of resolvents per x. If t is set to a nonzero value (line 25), the type and added arrays are updated correspondingly. The total number of buckets needed to store all added clauses is calculated by the formula (α×<sup>β</sup> + (<sup>γ</sup> <sup>−</sup>β)) and stored in buckets[*i*] at line 26. After type and added have been completely constructed, the loop at lines 3-4 identifies the index of the last variable eliminated starting from position <sup>|</sup>ϕ|−1. If the condition at line 4 holds, index <sup>j</sup> and the number of underlying resolvents are saved to *lastidx* and *lastadded*, respectively. These values will be used later to set the new size of the simplified formula S<sup>d</sup> on the host side.

Phase 2 is now ready to apply EXCLUSIVESCAN on the added and buckets lists. Both clauses and references refer to the structural members of Sd, as described in Fig. 1b. The procedure at line 6 takes the old size of clauses to offset the calculated references of the added resolvents. The SIZE routine returns the size of the input structure. Similarly, the second call at line 7 takes the old size of references and calculates the new indices for storing new references. Both scans are executed concurrently

#### Algorithm 4: Parallel Eager Redundancy Elimination for Inprocessing


via *stream0* and *stream1*, and are synchronized by the SYNCALL call at line 8. After the exclusive scan, the last element in added gives the total number of clauses in S<sup>d</sup> minus the resolvents added by the last eliminated variable. Therefore, adding this value to *lastadded* gives the total number of clauses in S<sup>d</sup> (line 9). At line 10, the last clause *last*<sup>C</sup> and its reference *lastcref* are fetched. At line 11, the number of buckets of *last*<sup>C</sup> is added to *lastcref* to get the total number of buckets *numBuckets*. The *numBuckets* and *numCls* are used to resize clauses and references, respectively, at line 12.

Finally, in phase 3, we use the calculated indices in added and buckets to guide the new resolvents to their locations in Sd. The kernel is described at lines 28-35. Each thread either calls the procedure RESOLVE or SUBSTITUTE, based on the type stored for the designated variables. Any produced units are saved into U<sup>d</sup> atomically. The *cref* and *rpos* variables indicate where resolvents should be stored in <sup>S</sup><sup>d</sup> per variable <sup>x</sup>.

### 7 Eager Redundancy Elimination

Alg. 4 describes a *two-dimensional* kernel, in which from each thread ID, an x and y coordinate is derived. This allows us to use two nested grid-stride loops. In the loops, we specify which of the two coordinates should be used to initialise i in the first iteration.

Based on the kernel's *y-dimension* ID (line 2), each thread merges where possible two clauses of its designated variable x and its complement x¯ (lines 3-6), and writes the result in shared memory as Cm. This new clause is produced by the routine RESOLVE at line 6. At lines 7-10, we check if one of the resolved clauses is LEARNT, and if so, the state *st* of C<sup>m</sup> is set to LEARNT as well, otherwise it is set to ORIGINAL. This state of C<sup>m</sup> will guide the FORWARDEQUALITY routine called at line 11 to search for redundant clauses of the same type. This routine is a device function, as it can only be called from a kernel, and is described at lines 12-17. In this function, the x-dimension of the thread ID is used to search the clauses referenced by the minimum occurrence list *minList*, which is produced by FINDMINLIST at line 13. It has the minimum size among the lists of all literals in Cm. If a clause C is found that is equal to C<sup>m</sup> and is either LEARNT or has a state equal to the one of Cm, it is set to DELETED (lines 16).

Fig. 3: Speedup of the proposed VE and GC algorithms on the benchmark suite

### 8 Experiments

We implemented the proposed algorithms in PFROST-GPU<sup>3</sup> with CUDA C++ version 11.0 [31]. We evaluated all GPU experiments on an NVIDIA Titan RTX GPU. This GPU has 72 SMs (64 cores each), 24 GB global memory and 48 KB shared memory. The GPU operates at a base clock of 1.3 GHz (boost: 1.7 GHz). The GPU machine was running Linux Mint v20 with an Intel Core i5-7600 CPU of 3.5 GHz base clock speed (turbo: 4.1 GHz) and a system memory of 32 GB.

We selected 493 SAT problems from the 2013-2020 SAT competitions. All formulas larger than 5 MB in size are chosen, excluding redundancies (repeated CNFs across competitions). For very small problems, the GPU is not really needed, as only few variables and clauses can be removed. The selected problems encode around 70+ different real-world applications, with various logical properties.

In the experiments, besides the implementations of our new GPU algorithms, we involved a CPU-only version of PARAFROST (PFROST-CPU), and the CADICAL [6] SAT solver for the solving of problems, and executed these on the compute nodes of the Lisa CPU cluster4. Each problem was analysed in isolation on a separate computing node. Each computing node had an Intel Xeon Gold 6130 CPU running at a base clock speed of 2.1 (turbo: 3.7) GHz with 96 GB of system memory, and runs on Debian Linux operating system. With this information, we adhere to all five principles laid out in the SAT manifesto (version 1) [9], noting that we also included problems older than three years, to have a sufficient number of large problems to work with.

SAT-Simplification Speedup. Figure 3 discusses the performance evaluation of the GPU Algorithms 1 and 3 compared to their previous implementations in SIGMA [34]. For these experiments, we set μ and *phases* initially to 32 and 5, respectively. Preprocessing is only enabled to measure the speedup. Fig. 3a shows the speedup of running parallel GC against a sequential version on the host. Clearly, for almost all cases, Alg. 1 achieved a drastic acceleration when executed on the device with a maximum speed up of 93× and an average of 48×. Fig. 3b reveals how fast the 3-phase parallel VE is

<sup>3</sup> Solvers/formulas are available at https://gears.win.tue.nl/software/parafrost.

<sup>4</sup> This work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperative.

compared to version using more atomic instructions. On average, the new algorithm is twice as fast as the old BVIPE algorithm [34]. In addition, we get reproducible results.

SAT-Solving. These experiments provide a thorough assessment of our CPU/GPU solver, the CPU-only version, and CADICAL on SAT solving with preprocessing + inprocessing turned on. The features *walksat*, *vivification* and *probing* [6] are disabled in CADICAL as they are not yet supported in PARAFROST. As in PARAFROST, all elimination methods in CADICAL are turned on with a bound on the occurrence list size set to 30,000. The same parameters for the search heuristics are used for all experiments. However, we delay the scheduling of inprocessing in PARAFROST until 4,000 of the fixed (root) variables are removed. The occurrence limit μ is bounded by 32 in CADICAL. On the other hand, we start with 32 and double this value every new *phase* as shown in Alg. 2. These extensions increase the likelihood of doing more work on the GPU. The timeout for all experiments is set to 5,000 seconds. The timeout for the sequential solvers has a 6% tolerance (i.e., is 5,300 seconds in total) to compensate for the different CPU frequencies of the GPU machine and the cluster nodes.

Figure 4 demonstrates the runtime results for all solvers over the benchmark suite. Subplot (a) shows the total time (simplify + solving) for all formulas. Data are sorted w.r.t. the x-axis. The simplify time accounts data transfers in PFROST-GPU. Overall, PFROST-GPU dominates over PFROST-CPU and CADICAL. Subplot (b) demonstrates the solving impact of PFROST-GPU versus CADICAL on SAT/UNSAT formulas. PFROST-GPU seems more effective on UNSAT formulas than CADICAL. Collectively, PFROST-GPU performed faster on 196 instances (58% out of all solved), in which 18 formulas were unsolved by CADICAL.

Subplots (c) and (d) show simplification time and its percentage of the total processing time, respectively. Clearly, the CPU/GPU solver outperforms its sequential counterpart due to the parallel acceleration. Plot (d) tells us that PFROST-GPU keeps the workload in the region between 0 and 20% as the elimination methods are scheduled on a bulk of mutually independent variables in parallel. In CADICAL, variables and clauses are simplified sequentially, which takes more time. Plot (e) shows the effectiveness of ERE on formulas with successful clause reductions. The last plot (f) reflects the overall efficiency of parallel inprocessing on variables and clauses (learnt clauses are included). Data are sorted in descending order. Reductions can remove up to 90% and 80% of the variables and clauses, respectively.

### 9 Related Work

A simple GC monitor for GPU term rewriting has been proposed by van Eerd *et al.* [18]. The monitor tracks deleted terms and stores their indices in a list. New terms can be added at those indices. The authors in [1, 26] investigated the challenges for offloading garbage collectors to an Accelerated Processing Unit (APU). Matthias *et al.* [39] introduced a promising alternative for stream compaction [10] via parallel defragmentation on GPUs. Our GC, on the other hand, is tailored to SAT solving, which allows it to be simple yet efficient. Regarding inprocessing, Jarvisalo ¨ *et al.* [23] introduced certain rules to determine how and when inprocessing techniques can be applied. Acceleration of the DPLL SAT solving algorithm on a GPU has been done in [15], where

Fig. 4: SAT Solving Statistics

some parts of the search were performed on a GPU and the remainder is handled by the CPU. Incomplete approaches are more amenable to be executed entirely on a GPU, e.g., an approach using metaheuristic algorithms [44]. We are the first to work on GPU inprocessing in modern CDCL solvers.

### 10 Conclusion

We have shown that GPU-accelerated inprocessing significantly reduces simplification time in SAT solving, allowing more problems to be solved. Parallel ERE and VE can be performed efficiently on many-core systems, producing impactful reductions on both original and learnt clauses in a fraction of a second, even for large problems. The proposed parallel GC achieves a substantial speedup in compacting SAT formulas on a GPU, while stimulating coalesced accessing of clauses.

Concerning future work, the results suggest to continue taking the capabilities of GPU inprocessing further by supporting more simplification techniques.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Forest: An Interactive Multi-tree Synthesizer for Regular Expressions*-*

(-) Margarida Ferreira1,2, Miguel Terra-Neves2, Miguel Ventura2, Inês Lynce1, and Ruben Martins<sup>3</sup>

<sup>1</sup> INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal {margaridaacferreira, ines.lynce}@tecnico.ulisboa.pt <sup>2</sup> OutSystems, Linda-a-Velha, Portugal {miguel.neves, miguel.ventura}@outsystems.com <sup>3</sup> Carnegie Mellon University, Pittsburgh, USA rubenm@cs.cmu.edu

Abstract Form validators based on regular expressions are often used on digital forms to prevent users from inserting data in the wrong format. However, writing these validators can pose a challenge to some users. We present Forest, a regular expression synthesizer for digital form validations. Forest produces a regular expression that matches the desired pattern for the input values and a set of conditions over capturing groups that ensure the validity of integer values in the input. Our synthesis procedure is based on enumerative search and uses a Satisfiability Modulo Theories (SMT) solver to explore and prune the search space. We propose a novel representation for regular expressions synthesis, multitree, which induces patterns in the examples and uses them to split the problem through a divide-and-conquer approach. We also present a new SMT encoding to synthesize capture conditions for a given regular expression. To increase confidence in the synthesized regular expression, we implement user interaction based on distinguishing inputs.

We evaluated Forest on real-world form-validation instances using regular expressions. Experimental results show that Forest successfully returns the desired regular expression in 70% of the instances and outperforms Regel, a state-of-the-art regular expression synthesizer.

### 1 Introduction

Regular expressions (also known as regexes) are powerful mechanisms for describing patterns in text with numerous applications. One notable use of regexes is to perform real-time validations on the input fields of digital forms. Regexes help filter invalid values, such as typographical mistakes ('typos') and format inconsistencies. Aside from validating the format of form input strings, regular expressions can be coupled with capturing groups. A capturing group is a subregex within a regex that is indicated with parenthesis and captures the text

<sup>-</sup> This work was supported by NSF award CCF-1762363 and through FCT under project UIDB/50021/2020, and project ANI 045917 funded by FEDER and FCT.

<sup>©</sup> The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 152–169, 2021. https://doi.org/10.1007/978-3-030-72016-2\_9

matched by the sub-regex inside them. Capturing groups are used to extract information from text and, in the domain of form validation, they can be used to enforce conditions over values in the input string. In this paper, we focus on the capture of integer values in input strings, and we use the notation \$i, i ∈ {0, <sup>1</sup>, ...} to refer to the integer value of the text captured by the (i + 1)th group.

Form validations often rely on complex regexes which require programming skills that not all users possess. To help users write regexes, prior work has proposed to synthesize regular expressions from natural language [1,9,12,27] or from positive and negative examples [1,7,10,26]. Even though these techniques assist users in writing regexes for search and replace operations, they do not specifically target digital form validation and do not take advantage of the structured format of the data.

In this paper, we propose Forest, a new program synthesizer for regular expressions that targets digital form validations. Forest takes as input a set of examples and returns a regex validation. Forest accepts three types of examples: (i) valid examples: correct values for the input field, (ii) invalid examples: incorrect values for the input field due to their *format*, and (iii) conditional invalid examples (optional): incorrect values for the input field due to their *values*. Forest outputs a regex validation, consisting of two components: (i) a regular expression that matches all valid and none of the invalid examples and (ii) capture conditions that express integer conditions that are satisfied by the values on all the valid but none of the conditional invalid examples.

Motivating Example. Suppose a user is writing a form where one of the fields is a date that must respect the format DD/MM/YYYY. The user wants to accept:


But not:

A regular expression can be used to enforce this format. Instead of writing it, the user may simply use the two sets of values as *valid* and *invalid* input examples to Forest, who will output the regex [0-9]{2}/[0-9]{2}/[0-9]{4}.

26-10-1998 1/12/2001 2015/08/31

Additionally, if the user wants to validate not only the format, but also the values in the date, we can consider as *conditional invalid* the examples:


Forest will output a regex validation complete with conditions over capturing groups that ensures only valid values are inserted as the day and month: ([0-9]{2})/([0-9]{2})/[0-9]{4}, \$0 ≤ 31 ∧ \$0 ≥ 1 ∧ \$1 ≤ 12 ∧ \$1 ≥ 1.

As we can see in the motivating example, data inserted into digital forms is usually structured and shares a common pattern among the valid examples. In this example, the data has the shape dd/dd/dddd where d represents a digit. This

Figure 1: Regex synthesis

contrasts with general regexes for search and replace operations that are often performed over unstructured text. Forest takes advantage of this structure by automatically detecting these patterns and using a divide-and-conquer approach to split the expression into simpler sub-expressions, solving them independently, and then merging their information to obtain the final regular expression. Additionally, Forest computes a set of capturing groups over the regular expression, which it then uses to synthesize integer conditions that further constrain the accepted values for that form field.

Input-output examples do not require specialized knowledge and are accessible to users. However, there is one downside to using examples as a specification: they are ambiguous. There can be solutions that, despite matching the examples, do not produce the desired behavior in situations not covered in them. The ambiguity of input-output examples raises the necessity of selecting one among multiple candidate solutions. To this end, we incorporate a user interaction model based on distinguishing inputs for both the synthesis of the regular expressions and the synthesis of the capture conditions.

In summary, this paper makes the following contributions:


### 2 Synthesis Algorithm Overview

The task of automatically generating a program that satisfies some desired behavior expressed as a high-level specification is known as Program Synthesis. Programming by Example (PBE) is a branch of Program Synthesis where the desired behavior is specified using input-output examples.

Our synthesis procedure is split into two stages, each relative to an output component. First, Forest synthesizes the regular expression, which is the basis

Figure 2: Interactive enumerative search

for the synthesis of capturing groups. Secondly, Forest synthesizes the capture conditions, by first computing a set of capturing groups and then the conditions to be applied to the resulting captures. The synthesis stages are detailed in sections 3 and 4. Figure 1 shows the regex validation synthesis pipeline. Both stages of our synthesis algorithm employ enumerative search, a common approach to solve the problem of program synthesis [4,5,10,17,21]. The enumerative search cycle is depicted in Figure 2.

There are two key components for program enumeration: the *enumerator* and the *verifier*. The *enumerator* successively enumerates programs from the a predefined Domain Specific Language (DSL). Following the Occam's razor principle, programs are enumerated in increasing order of complexity. The DSL defines the set of operators that can be used to build the desired program. Forest dynamically constructs its DSL to fit the problem at hand: it is as restricted as possible, without losing the necessary expressiveness. The regular expression DSL construction procedure is detailed in section 3.1.

For each enumerated program, the *verifier* subsequently checks whether it satisfies the provided examples. Program synthesis applications generate very large search spaces; nevertheless, the search space can be significantly reduced by pruning several infeasible expressions along with each incorrect expression found. In the first stage of the regex validation synthesis, the enumerated programs are regular expressions. The enumeration and pruning of regular expressions is described in section 3.2. In the second stage of regex validation synthesis, we deal with the enumeration of capturing groups over a pre-existing regular expression. This process is described in section 4.1.

To circumvent the ambiguity of input-output examples, Forest implements an interaction model. A new component, the *distinguisher*, ascertains, for any two given programs, whether they are equivalent. When Forest finds two different validations that satisfy all examples, it creates a *distinguishing input*: a new input that has a different output for each validation. To disambiguate between two programs, Forest shows the new input to the user, who classifies it as valid or invalid, effectively choosing one program over the other. The new input-output pair is added to the examples, and the enumeration process continues until there is only one solution left. This interactive cycle is described for the synthesis of regular expressions in section 3.3 and capture conditions in section 4.3.

Figure 3: [0-9]{2}/[0-9]{2}/[0-9]{4} represented as a k-tree with k = 2

### 3 Regular Expressions Synthesis

In this section we describe the enumerative synthesis procedure that generates a regular expression that matches all valid examples and none of the invalid.

#### 3.1 Regular Expressions DSL

Before the synthesis procedure starts, we define which operators can be used to build the desired regular expression and the values each operator can take as argument. Forest's regular expression DSL includes the regex union and concatenation operators, as well as several regular expression quantifiers:


The possible values for the range operators are limited depending on the valid examples provided by the user. For the single-valued range operator, <sup>r</sup>{m}, we consider only the integer values such that <sup>2</sup> <sup>≤</sup> <sup>m</sup> <sup>≤</sup> <sup>l</sup>, where <sup>l</sup> is the length of the longest valid example string. In the two-valued range operator, <sup>r</sup>{m, n}, the values of <sup>m</sup> and <sup>n</sup> are limited to integers such that <sup>0</sup> <sup>≤</sup> m<n <sup>≤</sup> <sup>l</sup>. The tuple (0,1) is not considered, since it is equivalent to the option quantifier: <sup>r</sup>{0, <sup>1</sup>} <sup>=</sup> <sup>r</sup>?.

All operators can be applied to regex literals or composed with each other to form more complex expressions. The regex literals considered in the synthesis procedure include the individual letters, digits or symbols present in the examples and all character classes that contain them. The character classes contemplated in the DSL are [0-9], [A-Z], [a-z] and all combinations of those, such as [A-Za-z] or [0-9A-Za-z]. Additionally, [0-9A-F] and [0-9a-f] are used to represent hexadecimal numbers.

#### 3.2 Regex Enumeration

To enumerate regexes, the synthesizer requires a structure capable of representing every feasible expression. We use a tree-based representation of the search

Figure 4: [0-9]{2}/[0-9]{2}/[0-9]{4} represented as a multi-tree with n = 5 and k = 2, resulting from the concatenation of 5 simpler regexes

space. A k-tree of depth d is a tree in which every internal node has exactly k children and every leaf node is at depth d. A program corresponds to an assignment of a DSL construct to each tree node, the node's descendants are the construct's arguments. If k is the greatest arity among all DSL constructs, then a k-tree of depth d can represent all programs of depth up to d in that DSL. The arity of constructs in Forest's regex DSLs is at most 2, so all regexes in the search space can be represented using 2-trees. To allow constructs with arity smaller than k, some children nodes are assigned the *empty* symbol, . In Figure 3, the regex from the motivating example, [0-9]{2}/[0-9]{2}/[0-9]{4}, is represented as a 2-tree of depth 5.

To explore the search space in order of increasing complexity, we enumerate k-trees of lower depths first and progressively increase the depth of the trees as previous depths are exhausted. The enumerator encodes the k-tree as an SMT formula that ensures the program is well-typed. A model that satisfies the formula represents a valid regex. Due to space constraints we omit the k-tree encoding but further details can be found in the literature [2,17].

Multi-tree representation. We considered several validators for digital forms and observed that many regexes in this domain are the concatenation of relatively simple regexes. However, the successive concatenation of simple regexes quickly becomes complex in its k-tree representation. Recall the regex for date validation presented in the motivating example: [0-9]{2}/[0-9]{2}/[0-9]{4}. Even though this is the concatenation of 5 simple sub-expressions, each of depth at most 2, its representation as a k-tree has depth 5, as shown in Figure 3.

The main idea behind the multi-tree constructs is to allow the number of concatenated sub-expressions to grow without it reflecting exponentially on the encoding. The multi-tree structure consists of n k-trees, whose roots are connected by an artificial root node, interpreted as an n-ary concatenation operator. This way, we are able to represent regexes using fewer nodes. Figure 4 is the multi-tree representation of the same regex as Figure 3, and shows that the multi-tree construct can represent this expression using half the nodes.

The k-tree enumerator successively explores k-trees of increasing depth. However, multi-tree has two measures of complexity: the depth of the trees, d, and the number of trees, n. Forest employs two different methods for increasing these values: static multi-tree and dynamic multi-tree.

Static multi-tree. In the static multi-tree method, the synthesizer fixes n and progressively increases d. To find the value of n, there is a preprocessing step, in which Forest identifies patterns in the valid examples. This is done by first identifying substrings common to all examples. A substring is considered a dividing substring if it occurs exactly the same number of times and in the same order in all examples. Then, we split each example before and after the dividing substrings. Each example becomes an array of n strings.

*Example 1.* Consider the valid examples from the motivating example. In these examples, '/' is a dividing substring because it occurs in every example, and exactly twice in each one. '0' is a common substring but not a dividing substring because it does not occur the same number or times in all examples. After splitting on '/', each example becomes a tuple of 5 strings:


Then, we apply the multi-tree method with <sup>n</sup> trees. For every <sup>i</sup> ∈ {1, ..., n}, the i th sub-tree represents a regex that matches all strings in the i th position of the split example tuples and the concatenation of the n regexes will match the original example strings. Since each tree is only synthesizing a part of the original input strings, a reduced DSL is recomputed for each tree.

Dynamic multi-tree. The dynamic multi-tree method is employed when the examples cannot be split because there are no dividing substrings. In this scenario, the enumerator will still use a multi-tree construct to represent the regex. However, the number of trees is not fixed and all trees use the original, complete DSL. A multi-tree structure with n k-trees of depth <sup>d</sup> has <sup>n</sup> <sup>×</sup> (k<sup>d</sup> <sup>−</sup> 1) nodes. Forest enumerates trees with different values of (n, d) in increasing order of number of nodes, starting with n = 1 and d = 2, a simple k-tree of depth 2.

Pruning. We prune regexes which are provably equivalent to others in the search space by using algebraic rules of regular expressions like the following:

$$\begin{aligned} (r\*)\* &\equiv r\* & (r?)? &\equiv r? & (r+)+ \equiv r+\\ (r+)\* &\equiv (r\*)+ \equiv r\* & (r?)\* &\equiv (r\*) & \equiv (r+)? & \equiv r\*\\ (r\*)\{m\} &\equiv (r\{m\})\* & & (r+)\{m\} \equiv (r\{m\})+ & & (r?)\{m\} \equiv (r\{m\})?\\ & & r\{n\}\{m\} &\equiv r\{m\}\{n\} & & \equiv \{m\times n\} \end{aligned}$$

To prevent the enumeration of equivalent regular expressions, we add SMT constraints that block all but one possible representation of each regex. Take, for example, the equivalence (r?)+ <sup>≡</sup> <sup>r</sup>∗. We want to consider only one way to represent this regex, so we add a constraint to block the construction (r?)+ for any regex r. Another such equivalence results from the idempotence of union: <sup>r</sup>|<sup>r</sup> <sup>=</sup> <sup>r</sup>. To prevent the enumeration of expressions of the type <sup>r</sup>|r, every time the union operator is assigned to a node i, we force the sub-tree underneath i's left child to be different from the sub-tree underneath i's right child by at least one node. When we enumerate a regex that is not consistent with the examples, it is eliminated from the search space. Along with the incorrect regex, we want to eliminate regexes that are equivalent to it. The union operator in the regular expressions DSL is commutative: <sup>r</sup>|<sup>s</sup> <sup>=</sup> <sup>s</sup>|r, for any regexes <sup>r</sup> and <sup>s</sup>. Thus, whenever an expression containing <sup>r</sup>|<sup>s</sup> is discarded, we eliminate the expression that contains <sup>s</sup>|<sup>r</sup> in its place as well.

#### 3.3 Regex Disambiguation

To increase confidence in the synthesizer's solution, Forest disambiguates the specification by interacting with the user. We employ an interaction model based on distinguishing inputs, which has been successfully used in several synthesizers [11,24,25,14]. To produce a distinguishing input, we require an SMT solver with a regex theory, such as Z3 [15,23]. Upon finding two regexes that satisfy the user-provided examples, r<sup>1</sup> and r2, we use the SMT solver to solve the formula:

$$\exists s: r\_1(s) \neq r\_2(s),\tag{1}$$

where r1(s) (resp. r2(s)) is True if and only if r<sup>1</sup> (resp. r2) matches the string s. A string s that satisfies (1) is a distinguishing input. Forest asks the user to classify this input as valid or invalid, and s is added to the respective set of examples, thus eliminating either r<sup>1</sup> or r<sup>2</sup> from the search space. After the first interaction, the synthesis procedure continues only until the end of the current depth and number of trees.

### 4 Capturing Groups Synthesis

In this section we describe the synthesis procedure of the second component of a regex validation: a set of integer conditions over captured values that are satisfied by all valid examples but none of the conditional invalid examples.

#### 4.1 Capturing Groups Enumeration

To enumerate capturing groups, Forest starts by identifying the regular expression's atomic sub-regexes: the smallest sub-regexes whose concatenation results in the original complete regex. For example, [0-9]{2} is an atomic sub-regex: there are no smaller sub-regexes whose concatenation results in it. It does not make sense to place a capturing group inside atomic sub-regexes: ([0-9]){2} does not have a clear meaning. Once identified, the atomic sub-regexes are placed in an ordered list. Enumerating capturing groups over the regular expression is done by enumerating non-empty disjoint sub-lists of this list. The elements inside each sub-list form a capturing group.

*Example 2.* Recall the date regex: [0-9]{2}/[0-9]{2}/[0-9]{4}. The respective list of atomic sub-regexes is [[0-9]{2}, /, [0-9]{2}, /, [0-9]{4}]. The following are examples of sub-lists of the atomic sub-regexes list and their resulting capturing groups:

[[[0-9]{2}], /, [0-9]{2}, /, [0-9]{4}] → ([0-9]{2})/[0-9]{2}/[0-9]{4} [[[0-9]{2}], /, [[0-9]{2}], /, [[0-9]{4}]] → ([0-9]{2})/([0-9]{2})/([0-9]{4})

### 4.2 Capture Conditions Synthesis

To compute capture conditions, we need all conditional invalid examples to be matched by the regular expression. After, capturing groups are enumerated as described in section 4.1. The number of necessary capturing groups is not known beforehand, so we enumerate capturing groups in increasing number.

A capture condition is a 3-tuple: it contains the captured text, an integer comparison operator and an integer argument. Forest considers only two integer comparison operators, ≤ and ≥. However, the algorithm can be easily expanded to include other operators. Let <sup>C</sup> be a set of capturing groups and <sup>C</sup>(x) the integer captures that result from applying <sup>C</sup> to example string <sup>x</sup>. Let <sup>D</sup><sup>C</sup> be the set of all possible capture conditions over capturing groups C. D<sup>C</sup> results from combining each capturing group with each integer operator. Finally, let V be the set of all valid examples, I the set of all conditional invalid examples, and X = V∪I the union of these two sets.

Given capturing groups <sup>C</sup>, Forest uses Maximum Satisfiability Modulo Theories (MaxSMT) to select from D<sup>C</sup> the minimum set of conditions that are satisfied by all valid examples and none of the conditional invalid. To encode the problem, we define two sets of Boolean variables. First, we define scap,x for every *cap* ∈ C(x) and <sup>x</sup> ∈ X . <sup>s</sup>cap,x <sup>=</sup> True if capture *cap* in example <sup>x</sup> satisfies all used conditions that refer to it. We also define <sup>u</sup>cond for all *cond* ∈ DC. <sup>u</sup>cond <sup>=</sup> True means condition *cond* is used in the solution. Additionally, we define a set of integer variables <sup>b</sup>cond, for all conditions *cond* ∈ D<sup>C</sup> that represent the integer argument present in each condition.

Let SMT(*cond*, x) be the SMT representation of condition *cond* for example x: the capture is an integer value, and the integer argument is the corresponding <sup>b</sup>cond variable. Let <sup>D</sup>cap ⊆ D<sup>C</sup> be the set of capture conditions that refer to capture *cap*. Constraint (2) states that a capture *cap* in example x satisfies all conditions if and only if for every condition that refers to *cap* either it is not used in the solution or it is satisfied for the value of that capture in that example:

$$s\_{cap,x} \leftrightarrow \bigwedge\_{cond \in \mathcal{D}\_{cap}} u\_{cond} \to \text{SMT}(cond, x). \tag{2}$$

*Example 3.* Recall the first valid string from the motivating example: x<sup>0</sup> = "19/08/1996". Suppose Forest has already synthesized the desired regular expression and enumerated a capturing group that corresponds to the day: ([0-9]{2})/[0-9]{2}/[0-9]{4}. Let *cond*<sup>0</sup> and *cond*<sup>1</sup> be the conditions that refer to the first (and only) capturing group, \$0, and operators ≤ and ≥ respectively. The SMT representation for *cond*<sup>0</sup> and <sup>x</sup><sup>0</sup> is SMT(*cond*0, x0) = 19 <sup>≤</sup> bcond<sup>0</sup> . Constraint (2) is:

$$(s\_{0,x\_0} \leftrightarrow (u\_{cond\_0} \to 19 \le b\_{cond\_0}) \land (u\_{cond\_1} \to 19 \ge b\_{cond\_1}) \dots)$$

Then, we ensure the used conditions are satisfied by all valid examples and none of the conditional invalid examples:

$$\bigwedge\_{x \in \mathcal{V}} \bigwedge\_{\substack{cap \in \mathcal{C}(x)}} s\_{cap, x} \land \bigwedge\_{x \in \mathcal{T}} \bigvee\_{\substack{cap \in \mathcal{C}(x)}} \neg s\_{cap, x} \cdot \tag{3}$$

Since we are looking for the minimum set of capture conditions, we add soft clauses to penalize the usage of capture conditions in the solution:

$$\bigwedge\_{cond \in \mathcal{D}\_{\mathcal{C}}} \neg u\_{cond}.\tag{4}$$

We consider part of the solution only the capture conditions whose ucond is True in the resulting SMT model. We also extract the values of the integer arguments in each condition from the model values of the bcond variables.

#### 4.3 Capture Conditions Disambiguation

To ensure the solution meets the user's intent, Forest disambiguates the specification using, once again, a procedure based on distinguishing inputs. Once Forest finds two different sets of capture conditions <sup>S</sup><sup>1</sup> and <sup>S</sup><sup>2</sup> that satisfy the specification, we look for a distinguishing input: a string c which satisfies all capture conditions in S1, but not those in S2, or vice-versa. First, to simplify the problem, Forest eliminates from <sup>S</sup><sup>1</sup> and <sup>S</sup><sup>2</sup> conditions which are present in both: these are not relevant to compute a distinguishing input. Let S<sup>∗</sup> <sup>1</sup> (resp. S∗ <sup>2</sup> ) be the subset of S<sup>1</sup> (resp. S2) containing only the distinguishing conditions, i.e., the conditions that differ from those in S<sup>2</sup> (resp. S1).

We do not compute the distinguishing string c directly. Instead, we compute the integer value of the distinguishing captures in c, i.e., the captures that result from applying the regular expression and its capturing groups to the distinguishing input string. We define |C| integer variables, <sup>c</sup>i, which correspond to the values of the distinguishing captures: <sup>c</sup>0, c1, ..., c|C| <sup>=</sup> <sup>C</sup>(c).

As before, let SMT(*cond*, c) be the SMT representation of each condition *cond*. Each capture in <sup>C</sup>(c) is represented by its respective <sup>c</sup>i, the operator maintains it usual semantics and the integer argument is its value in the solution to which the condition belongs. Constraint (5) states that c satisfies the conditions in one solution but not the other.

$$\bigwedge\_{cond \in \mathcal{S}\_1^\*} \text{SMT}(cond, c) \quad \neq \bigwedge\_{cond \in \mathcal{S}\_2^\*} \text{SMT}(cond, c). \tag{5}$$

In the end, to produce the distinguishing string c, Forest picks an example from the valid set, applies the regular expression with the capturing groups to it, and replaces its captures with the model values for ci.

Forest asks the user to classify c as valid or invalid. Depending on the user's answer, c is added as a valid or conditional invalid example, effectively eliminating either S<sup>1</sup> or S<sup>2</sup> from the search space.

*Example 4.* Recall the examples from the motivating example. No example invalidates a date with the day 32, so Forest will find two correct sets of capture conditions over the regular expression ([0-9]{2})/([0-9]{2})/[0-9]{4}: <sup>S</sup><sup>1</sup> <sup>=</sup> {\$0 <sup>≤</sup> <sup>31</sup>, \$0 <sup>≥</sup> <sup>1</sup>, \$1 <sup>≤</sup> <sup>12</sup>, \$1 <sup>≥</sup> <sup>1</sup>}, and <sup>S</sup><sup>2</sup> <sup>=</sup> {\$0 <sup>≤</sup> <sup>32</sup>, \$0 <sup>≥</sup> <sup>1</sup>, \$1 <sup>≤</sup> <sup>12</sup>, \$1 <sup>≥</sup> <sup>1</sup>}. First, we define two sets containing only the distinguishing captures: S<sup>∗</sup> <sup>1</sup> = {\$0 ≤ 31} and S<sup>∗</sup> <sup>2</sup> <sup>=</sup> {\$0 <sup>≤</sup> <sup>32</sup>}. Then, to find <sup>c</sup>0, the value of the distinguishing capture for these solutions, we solve the constraint:

$$\exists c\_0 : c\_0 \le 31 \ne c\_0 \le 32$$

and get the value <sup>c</sup><sup>0</sup> = 32 which satisfies <sup>S</sup><sup>∗</sup> <sup>2</sup> (and S2), but not S<sup>∗</sup> <sup>1</sup> (or S1).

If we pick the first valid example, "19/08/1996" as basis for c, the respective distinguishing input is c = "32/08/1996". Once the user classifies c as invalid, c is added as a conditional invalid example and S<sup>2</sup> is removed from consideration.

### 5 Related Work

Program synthesis has been successfully used in many domains such as string processing [8,19,7,26], query synthesis [11,25,17], data wrangling [2,5], and functional synthesis [3,6]. In this section, we discuss prior work on the synthesis of regular expressions [10,1] that is most closely related to our approach.

Previous approaches that perform general string processing [7,26] restrict the form of the regular expressions that can be synthesized. In contrast, we support a wide range of regular expressions operators, including the Kleene closure, positive closure, option, and range. More recent work that targets the synthesis of regexes is done by AlphaRegex [10] and Regel [1]. AlphaRegex performs an enumerative search and uses under- and over-approximations of regexes to prune the search space. However, AlphaRegex is limited to the binary alphabet and does not support the kind of regexes that we need to synthesize for form validations. Regel [1] is a state-of-the-art synthesizer of regular expressions based on a multi-modal approach that combines input-output examples with a natural language description of user intent. They use natural language to build hierarchical sketches that capture the high-level structure of the regex to be synthesized. In addition, they prune the search space by using under- and over-approximations and symbolic regexes combined with SMT-based reasoning. Regel's evaluation [1] has shown that their PBE engine is an order of magnitude faster than AlphaRegex. While Regel targets more general regexes that are suitable for search and replace operations, we target regexes for form validation which usually have more structure. In our approach, we take advantage of this structure to split the problem into independent subproblems. This can be seen as a special case of sketching [22] where each hole is independent. Our pruning techniques are orthogonal to the ones used by Regel and are based on removing equivalent regexes prior to the search and to remove equivalent failed regexes during search. To the best of our knowledge, no previous work focused on the synthesis of conditions over capturing groups.

Instead of using input-output examples, there are other approaches that synthesize regexes solely from natural language [9,12,27]. We see these approaches as orthogonal to ours and expect that Forest can be improved by hints provided by a natural language component such as was done in Regel.

### 6 Experimental Results

*Implementation.* Forest is open-source and publicly available at https://github. com/Marghrid/FOREST. Forest is implemented in Python 3.8 on top of Trinity, a general-purpose synthesis framework [13]. All SMT formulas are solved using the Z3 SMT solver, version 4.8.9 [15]. To find distinguishing inputs in regular expression synthesis, Forest uses Z3's theory of regular expressions [23]. To check the enumerated regexes against the examples, we use Python's regex library [18]. The results presented herein were obtained using an Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz, with 64GB of RAM, running Debian GNU/Linux 10. All processes were run with a time limit of one hour.

*Benchmarks.* To evaluate Forest, we used 64 benchmarks based on real-world form-validation regular expressions. These were collected from regular expression validators in validation frameworks and from regexlib [20], where users can upload their own regexes. Among these 64 benchmarks there are different formats: national IDs, identifiers of products, date and time, vehicle registration numbers, postal codes, email and phone numbers. For each benchmark, we generated a set of string examples. All 64 benchmarks require a regular expression to validate the examples, but only 7 require capture conditions. On average, each instance is composed of 13.2 valid examples (ranging from 4 to 33) and 9.3 invalid (ranging from 2 to 38). The 7 instances that target capture conditions have on average 6.3 conditional invalid examples (ranging from 4 to 8).

The goal of this experimental evaluation is to answer the following questions: Q1: How does Forest compare against Regel? (section 6.1)

Q2: How does pruning affect multi-tree's time performance? (section 6.2)


Q5: How many examples are required to return a correct solution? (section 6.4) Forest, by default, uses static multi-tree (when possible) with pruning. It correctly solves 31 benchmarks (48%) in under 10 seconds. In one hour, Forest solves 47 benchmarks (73%), with 96% accuracy: only two solutions did not correspond to the desired regex validation. Forest disambiguates only among programs at the same depth, and so if the first solution is not at the same depth


Table 1: Comparison of time performance using different synthesis methods

Figure 5: Instances solved using different methods

as the correct one, the correct solution is never found. After 1 hour of running time, Forest is interrupted, but it prints its current best validation before terminating. After the timeout, Forest returned 3 more regexes, 2 of which the correct solution for the benchmark. In all benchmarks to which Forest returns a solution, the first matching regular expression is found in under 10 minutes. In 40 benchmarks, the first regex is found in under 10 seconds. The rest of the time is spent disambiguating the input examples. Forest interacts with the user to disambiguate the examples in 27 benchmarks. Overall, it asks 1.8 questions and spends 38.6 seconds computing distinguishing inputs, on average.

Regarding the synthesis of capture conditions, in 5 of the benchmarks, we need only 2 capturing groups and at most 4 conditions. In these instances, the conditions' synthesis takes under 2 seconds. The remaining 2 benchmarks need 4 capturing groups and take longer: 99 seconds to synthesize 4 conditions and 1068 seconds for 6 conditions. During capture conditions synthesis, Forest interacts 7.14 times and takes 0.1 seconds to compute distinguishing inputs, on average.

Table 1 shows the number of instances solved in under 10, 60 and 3600 seconds using Forest, as well as using the different variations of the synthesizer which will be described in the following sections. The cactus plot in Figure 5 shows the cumulative synthesis time on the y-axis plotted against the number of benchmarks solved by each variation of Forest (on the x-axis). The synthesis methods that correspond to lines more to the right of the plot are able to solve more benchmarks in less time. We also compare solving times with Regel [1]. Regel takes as input examples and a natural description of user intent. We consider not only the complete Regel synthesizer, but also the PBE engine of Regel by itself, which we denote by Regel PBE.

# 6.1 Comparison with Regel

As mentioned in section 5, Regel's synthesis procedure is split into two steps: sketch generation (using a natural language description of desired behavior) and sketch completion (using input-output examples). To compare Regel and Forest, we extended our 64 form validation benchmarks with a natural language description. To assess the importance of the natural language description, we also ran Regel using only its PBE engine. Sketch generation took on average 60 seconds per instance, and successfully generated a sketch for 63 instances. The remaining instance was run without a sketch. We considered only the highest ranked sketch for each instance. In Table 1 we show how many instances can be solved with different time limits for sketch completion; note that these values do not include the sketch generation time. Regel returned a regular expression for 47 instances within the time limit. Since Regel does not implement a disambiguation procedure, the returned regular expression does not always exhibit the desired behavior, even though it correctly classifies all examples. Of the 47 synthesized expressions, 31 exhibit the desired intent. This is a 66% accuracy, which is the same as Forest without disambiguation (Forest's 1st regex) but it is much lower than Forest with disambiguation at 96%. We also observe that Regel's performance is severely impaired when using only its PBE engine.

51 out of the 63 generated sketches are of the form -{S1, ..., Sn}, where each S<sup>i</sup> is a concrete sub-regex, i.e., has no holes. This construct indicates the desired regex must contain *at least* one of S1, ..., Sn, and contains no information about the top-level operators that are used to connect them. 22 of the 47 synthesized regexes are based on sketches of that form, and they result from the direct concatenation of *all* components in the sketch. No new components are generated during sketch completion. Thus, most of Regel's sketches could be integrated into Forest, whose multi-tree structure holds precisely those top-level operators that were missing from Regel's sketches.

#### 6.2 Impact of pruning the search space and splitting examples

To evaluate the impact of pruning the search space as described in section 3.2, we ran Forest with all pruning techniques disabled. In the scatter plot in Figure 6a, we can compare the solving time on each benchmark with and without pruning. Each mark in the plot represents an instance. The value on the y-axis shows the synthesis time of multi-tree with pruning disabled and the value on the xaxis the synthesis time with pruning enabled. The marks above the y = x line

Figure 6: Comparison of synthesis time using different variations of Forest.

(also represented in the plot) represent problems that took longer to synthesize without pruning than with pruning. On average, with pruning, Forest can synthesize regexes in 42% of the time and enumerates about 15% of the regexes before returning. There is no significant change in the number of interactions before returning the desired solution.

Forest is able to split the examples and use static multi-tree as described in section 3.2 in 52 benchmarks (81%). The remaining 12 are solved using dynamic multi-tree. To assess the impact of using static multi-tree we ran Forest with a version of the multi-tree enumerator that does not split the examples, and jumps directly to dynamic multi-tree solving. In the scatter plot in Figure 6b, we compare the solving times of each benchmark. Using static multi-tree when possible, Forest requires, on average, less than two thirds of the time (59.1%) to return the desired regex for benchmarks solved by both methods. Furthermore, with static multi-tree Forest can synthesize more complex regexes: the maximum number of nodes in a solution returned by dynamic multi-tree is 12 (avg. 6.7), while complete multi-tree synthesizes regexes of up to 24 nodes (avg. 10.3).

#### 6.3 Multi-tree versus *k*-tree and line-based encodings

To evaluate the performance of multi-tree enumeration, we ran Forest with two other enumeration encodings: k-tree and line-based. The latter is a state of the art encoding for the synthesis of SQL queries [17]. k-tree is the default enumerator in Trinity [13], and the line-based enumerator is available in Squares [16]. The k-tree encoding has a very similar structure to that of multi-tree, so our pruning techniques were easily applied to this encoding. On the other hand, line-based encoding is intrinsically different, so the pruning techniques were not implemented. We compare the line-based encoding to multi-tree without pruning. In every other aspect, the three encodings were run in the same conditions, using Forest's regex DSL. k-tree is able to synthesize programs with up to 10 nodes, while the line-based encoding synthesizes programs of up to 9 nodes. Neither encoding outperforms multi-tree.

As seen in Table 1, line-based encoding does not outperform the tree-based encodings for the domain of regexes while it was much better for the domain of SQL queries [17]. We conjecture this disparity arises from the different nature of DSLs. Most SQL queries, when represented as a tree, leave many branches of the tree unused, which results in a much larger tree and SMT encoding.

#### 6.4 Impact of fewer examples

To assess the impact of providing fewer examples on the accuracy of the solution, we ran Forest with modified versions of each benchmark. First, each benchmark was run with at most 10 valid and 10 invalid examples, chosen randomly among all examples. Conditional invalid examples are already very few per instance, so these were not altered. The accuracy of the returned regexes is slightly lower.

With only 10 valid and 10 invalid examples, Forest returns the correct regex in 93.5% of the benchmarks, which represents a decrease of only 2.5% relative to the results with all examples. We also saw an increase in the number of interactions before returning, since fewer examples are likely to be more ambiguous. With only 10 examples, Forest interacts on average 2.2 times per benchmark, which represents an increase of about a fifth. The increase in the number of interactions reflects on a small increase in the synthesis time (less than 1%).

After, we reduced the number of examples even further: only 5 valid and 5 invalid. The accuracy of Forest in this setting was reduced to 71%. On average, it interacted 4.3 times per benchmark, which is over two times more than before.

### 7 Conclusions and Future Work

Regexes are commonly used to enforce patterns and validate the input fields of digital forms. However, writing regex validations requires specialized knowledge that not all users possess. We have presented a new algorithm for synthesis of regex validations from examples that leverages the common structure shared between valid examples. Our experimental evaluation shows that the multi-tree representation synthesizes three times more regexes than previous representations in the same amount of time and, together with the user interaction model, Forest solves 70% of the benchmarks with the correct user intent. We verified that Forest maintains a very high accuracy with as few as 10 examples of each kind. We also observed that our approach outperforms Regel, a state-of-the-art synthesizer, in the domain of form validations.

As future work, we would like to explore the synthesis of more complex capture conditions, such as conditions depending on more than one capture. This would allow more restrictive validations; for example, in a date, the possible values for the day could depend on the month. Another possible extension to Forest is to automatically separate invalid from conditional invalid examples, making this distinction imperceptible to the user.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Probabilities**

# **Finding Provably Optimal Markov Chains**

Jip Spel1(-) , Sebastian Junges<sup>2</sup> , and Joost-Pieter Katoen<sup>1</sup>

> <sup>1</sup> RWTH Aachen University, Aachen, Germany jip.spel@cs.rwth-aachen.de <sup>2</sup> University of California, Berkeley, California, USA

**Abstract.** Parametric Markov chains (pMCs) are Markov chains with symbolic (aka: parametric) transition probabilities. They are a convenient operational model to treat robustness against uncertainties. A typical objective is to find the parameter values that maximize the reachability of some target states. In this paper, we consider automatically proving robustness, that is, an ε-close upper bound on the maximal reachability probability. The result of our procedure actually provides an almostoptimal parameter valuation along with this upper bound.

We propose to tackle these ETR-hard problems by a tight combination of two significantly different techniques: monotonicity checking and parameter lifting. The former builds a partial order on states to check whether a pMC is (local or global) monotonic in a certain parameter, whereas parameter lifting is an abstraction technique based on the iterative evaluation of pMCs without parameter dependencies. We explain our novel algorithmic approach and experimentally show that we significantly improve the time to determine almost-optimal synthesis.

### **1 Introduction**

Background and problem setting. Probabilistic model checking [3, 20] is a wellestablished field and has various applications but assumes probabilities to be fixed constants. To deal with uncertainties, symbolic parameters are used. Parametric Markov chains (pMCs, for short) define a family of Markov chains with uncountably many family members, called instantiations, by having symbolic (aka: parametric) transition probabilities [10,22]. We are interested in determining optimal parameter settings: which instantiation meets a given objective the best? The typical objective is to maximize the reachability probability of a set of target states. This question is inspired by practical applications such as: what are the optimal parameter settings in randomised controllers to minimise power consumption?, and what is the optimal bias of coins in a randomised distributed algorithm to maximise the chance of achieving mutual exclusion? For most applications, it suffices to achieve parameters that attain a given quality of service that is

Supported by DFG RTG 2236 "UnRAVeL" and ERC AdG 787914 FRAPPANT.

Supported by the NSF grants 1545126 (VeHICaL) and 1646208, by the DARPA Assured Autonomy program, by Berkeley Deep Drive, and by Toyota under the iCyPhy center.

<sup>©</sup> The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 173–190, 2021. https://doi.org/10.1007/978-3-030-72016-2\_10

ε-close to the unknown optimal solution. More precisely, this paper concentrates on automatically proving ε-robustness, i.e., determine an upper bound which is ε-close to the maximal reachability probability. The by-product of our procedure actually provides an almost-optimal parameter valuation too.

Existing parameter synthesis techniques. Efficient techniques have been developed in recent years for the feasibility problem: given a parametric Markov chain, and a reachability objective, find an instantiation that reaches the target with at least a given probability. To solve this problem, it suffices to "guess" a correct family member, i.e., a correct parameter instantiation. Verifying the "guessed" instantiation against the reachability objective is readily done using off-theshelf Markov chain model-checking algorithms. Most recent progress is based on advanced techniques that make informed guesses: This ranges from using sampling techniques [14], guided sampling such as particle swarm optimisation [7], by greedy search [24], or by solving different variants of a convex optimisation problem around a sample [8, 9]. Sampling has been accelerated by reusing previous model checking results [25], or by just in time compilation of the parameter function [12]. These methods are inherently inadequate for finding optimal parameter settings. To the best of our knowledge, optimal parameter synthesis has received scant attention so far. A notable exception is the analysis (e.g., using SMT techniques) of rational functions, typically obtained by some form of state elimination [10,12,15], that symbolically represent reachability probabilities in terms of the parameters. These functions are exponential in the number of parameters [16] and become infeasible for more than two parameters. Parameter lifting [5, 6, 25] remedies this by using an abstraction technique, but due to an exponential blow-up of region splitting, is limited to a handful of parameters. The challenge is to solve optimal parameter synthesis problems with more parameters.

Approach. We propose to tackle the optimal synthesis problem by a deep integration of two seemingly unrelated techniques: monotonicity checking [27] and parameter lifting [25]. The former builds a partial order on the state space to check whether a pMC is (local or global) monotonic in a certain parameter, while the latter is an abstraction technique that "lifts" the parameter dependencies, obtaining interval MCs [17,21], and solves them in an iterative manner. To construct an efficient combination, we extend both methods such that they profit from each other. This is done by combining them with a tailored divide-and-conquer component, see Fig. 1. To prove bounds on the induced reachability probability, parameter lifting has been the undisputed state-of-the-art, despite the increased attention that parameter synthesis has received over recent years. This paper improves parameter lifting with more advanced reasoning capabilities that involve properties of the derivative, rather than the actual probabilities. These reasoning methods enable reducing the exponent of the inherently exponential-time procedure. This conceptual advantage is joined with various engineering efforts. Parameter lifting is accelerated by using side products of monotonicity analysis such as local monotonicity and shrinked parameter regions. Furthermore, bounds obtained by parameter lifting are used to obtain a cheap rule accelerating the

**Fig. 1.** The symbiosis of parameter lifting and monotonicity checking. Red are new interactions, compared to earlier work. Details are given in Sect. 3.

monotonicity checker. The interplay between the two advanced techniques is tricky and requires a careful treatment.

Note that we are not the first to exploit monotonicity in the context of pMCs. Hutschenreiter et al. [16] showed that the complexity of model checking (a monotone fragment of) PCTL on monotonic pMC is lower than on general pMCs. Pathak et al. [24] provided an efficient greedy approach to repair monotonic pMCs. Recently, Gouberman et al. [13] used monotonicity for hitting probabilities in perturbed continuous-time MCs.

Experimental results. We realised the integrated approached on top of the Storm [11] model checker. Experiments on several benchmarks show that optimal synthesis is possible: (1) on benchmarks with up to about a few hundred parameters, (2) on benchmarks that cannot be handled without monotonicity, (3) while accelerating pure parameter lifting by up to two orders of magnitude. Our approach induces a bit of overhead on small instances for some benchmarks, and starts to pay off when increasing the number of parameters.

Main contribution. In summary, the main contribution of this paper is a tight integration of parameter lifting and monotonicity checking. Experiments indicate that this novel combination substantially improves upon the state-of-the-art in optimal parameter synthesis.

Organisation of the paper. Section 2 provides the necessary technical background and formalises the problem. Section 3 explains the approach—in particular the meaning of the arrows in Fig. 1. Section 4 discusses how to state bounds can be exploited in the monotonicity checker. Section 5 details how to exploit local monotonicity in parameter lifting. Section 6 then considers the tight interplay via the divide-and-conquer method. Section 7 reports on the experimental results of our prototypical implementation in Storm while Section 8 concludes the paper.

### **2 Problem Statement**

A probability distribution over a finite or countably infinite set X is a function <sup>μ</sup>: <sup>X</sup> <sup>→</sup> [0, 1] <sup>⊆</sup> <sup>R</sup> with <sup>x</sup>∈<sup>X</sup> <sup>μ</sup>(x) = 1. The set of all distributions on <sup>X</sup> is denoted by Distr (X). Let <sup>a</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> denote (a1,...,an). The set of multivariate polynomials over ordered variables x = (x1,...,xn) is denoted Q[x]. For a polynomial <sup>f</sup> and variable <sup>x</sup>, we write <sup>x</sup> <sup>∈</sup> <sup>f</sup> if the variable occurs in the polynomial f. An instantiation for a finite set V of real-valued variables is a function <sup>u</sup>: <sup>V</sup> <sup>→</sup> <sup>R</sup>. We often denote <sup>u</sup> as a vector <sup>u</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> with <sup>u</sup><sup>i</sup> := <sup>u</sup>(xi) for <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>V</sup> . A polynomial <sup>f</sup> can be interpreted as a function <sup>f</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>, where <sup>f</sup>(u) is obtained by substitution, i.e., <sup>f</sup>[<sup>x</sup> <sup>←</sup> <sup>u</sup>], where each occurrence of <sup>x</sup><sup>i</sup> in <sup>f</sup> is replaced by u(xi).

**Definition 1 (pMC).** A parametric Markov Chain (pMC) is a tuple M = (S,sI ,T,V,P) with a finite set <sup>S</sup> of states, an initial state <sup>s</sup>I <sup>∈</sup> <sup>S</sup>, a finite set <sup>T</sup> <sup>⊆</sup> <sup>S</sup> of target states, a finite set <sup>V</sup> of real-valued variables (parameters) and <sup>a</sup> transition function <sup>P</sup> : <sup>S</sup> <sup>×</sup> <sup>S</sup> <sup>→</sup> <sup>Q</sup>[<sup>V</sup> ].

A pMC M is a (discrete-time) Markov chain (MC) if the transition function yields well-defined probability distributions, i.e., <sup>P</sup>(s, ·) <sup>∈</sup> Distr (S) for each <sup>s</sup> <sup>∈</sup> <sup>S</sup>. Applying an instantiation <sup>u</sup> to a pMC <sup>M</sup> yields <sup>M</sup>[u] by replacing each <sup>f</sup> <sup>∈</sup> <sup>Q</sup>[<sup>V</sup> ] in <sup>M</sup> by <sup>f</sup>(u). An instantiation <sup>u</sup> is well-defined (for <sup>M</sup>) if <sup>M</sup>[u] is an MC. A well-defined instantiation <sup>u</sup> is graph-preserving (for <sup>M</sup>) if the topology is preserved, i.e., <sup>P</sup>(s, s ) = 0 implies <sup>P</sup>(s, s )(u) = 0 for all states <sup>s</sup> and <sup>s</sup> . A set of instantiations is called a region. A region R is well-defined (graph-preserving) if <sup>u</sup> is well-defined (graph-preserving) for all <sup>u</sup> <sup>∈</sup> <sup>R</sup>. In this paper, we consider only graph-preserving regions.

For a parameter-free MC <sup>M</sup>, Pr<sup>s</sup> <sup>M</sup>(♦T) <sup>∈</sup> [0, 1] <sup>⊆</sup> <sup>R</sup> denotes the probability that from state s the target T is eventually reached. For a formal definition, we refer to, e.g., [4, Ch. 10]. For pMC <sup>M</sup>, Pr<sup>s</sup> <sup>M</sup>(♦T) is not a constant, but rather a function Pr<sup>s</sup>→<sup>T</sup> <sup>M</sup> : <sup>V</sup> <sup>→</sup> [0, 1], with Pr<sup>s</sup>→<sup>T</sup> <sup>M</sup> (u) = Pr<sup>s</sup> <sup>M</sup>[u](♦T). The closed-form of Pr<sup>s</sup>→<sup>T</sup> on a graph-preserving region is a rational function over V , i.e., a fraction of two polynomials over V . On a graph-preserving region, the function Pr<sup>s</sup>→<sup>T</sup> is continuously differentiable [25]. We call Pr<sup>s</sup>→<sup>T</sup> <sup>M</sup> the solution function, and for conciseness, we often omit the subscript M. Graph-preserving instantiations u, u preserve zero-one probabilities, i.e., Pr<sup>s</sup>→<sup>T</sup> (u) = 0 implies Pr<sup>s</sup>→<sup>T</sup> (u ) = 0, and analogous for =1. We simply write Pr<sup>s</sup>→<sup>T</sup> = 0 (or =1). Let ( ) denote all states <sup>s</sup> <sup>∈</sup> <sup>S</sup> with Pr<sup>s</sup>→<sup>T</sup> =1(Pr<sup>s</sup>→<sup>T</sup> = 0). By a standard preprocessing [4], we may safely assume a single and state.

Problem statement. This paper is concerned with the following questions for a given pMC <sup>M</sup> with target states <sup>T</sup>, and region <sup>R</sup>:

Optimal synthesis. Find the instantiation u<sup>∗</sup> such that

$$\vec{u}^\* = \arg\max\_{\vec{u}\in R} \text{Pr}\_{\mathcal{M}[\vec{u}]}(\lozenge T)$$


$$\max\_{\vec{u}\in R} \operatorname{Pr}\_{\mathcal{M}[\vec{u}]}(\lozenge T) - \varepsilon \le \operatorname{Pr}\_{\mathcal{M}[\vec{u}^\*]}(\lozenge T) \le \max\_{\vec{u}\in R} \operatorname{Pr}\_{\mathcal{M}[\vec{u}]}(\lozenge T) \dots$$

**Fig. 2.** Toy examples for pMCs.

The optimal synthesis problem is ETR-hard [28], i.e., as hard as finding a root of a multivariate polynomial. It is thus NP-hard and in PSPACE. The same applies to ε-robustness. The value of λ can be viewed as the optimal reachability probability of T — up to the robustness tolerance ε — over all possible parameter values while u<sup>∗</sup> is the instantiation that maximises the probability to reach T.

Like [28], we assume pMCs to be simple, i.e., <sup>P</sup>(s, s ) ∈ {x, <sup>1</sup>−<sup>x</sup> <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>V</sup> } ∪ <sup>Q</sup> for all s, s <sup>∈</sup> <sup>S</sup> and s- <sup>P</sup>(s, s ) = 1. Theoretically, the above problem for simple pMCs is as hard as for general pMCs, and practically, most pMCs are simple. For simple pMCs, the graph-preserving instantiations are in (0, 1)|<sup>V</sup> <sup>|</sup> . Regions are assumed to be well-defined, rectangular and closed, i.e., a region is a Cartesian product of closed intervals, <sup>R</sup> <sup>=</sup> ×<sup>x</sup>∈<sup>V</sup> [x, ux]. Let <sup>R</sup>(x) denote the interval [x, ux] and occur(s) the set of variables {<sup>x</sup> <sup>∈</sup> <sup>V</sup> | ∃s <sup>∈</sup> S. x ∈ P(s, s )}. For simple pMCs, this set has cardinality at most one. A state s is called parametric, if occur(s) <sup>=</sup> <sup>∅</sup>; we write occur(s) = <sup>x</sup> if {x} <sup>=</sup> occur(s).

Example 1. Fig. 2(a) depicts a pMC. A region <sup>R</sup> is given by <sup>p</sup> <sup>∈</sup> [1/4, <sup>1</sup>/2]. An instantiation <sup>u</sup> <sup>=</sup> {<sup>p</sup> <sup>→</sup> <sup>1</sup>/<sup>3</sup>} ∈ <sup>R</sup> yields the pMC in Fig. 2(b). The solution function is Pr<sup>s</sup>0→<sup>T</sup> <sup>M</sup><sup>1</sup> <sup>=</sup> <sup>p</sup> · (1 <sup>−</sup> <sup>p</sup>). Indeed Pr<sup>s</sup>0→<sup>T</sup> <sup>M</sup><sup>1</sup> (u) = <sup>2</sup>/<sup>9</sup> = PrM1[u](♦T).

### **3 Main Ingredients in a Nutshell**

To solve the problem statement, we consider an iterative method which analyzes regions, and, if necessary, splits these regions. In particular, we combine two approaches — parameter lifting and monotonicity checking — as shown in Fig. 1.

#### **3.1 The Monotonicity Checker**

We consider local and global monotonicity. We start with defining the latter.

**Definition 2 (Global monotonicity).** A continuously differentiable function f on region <sup>R</sup> is monotonic increasing in variable <sup>x</sup>, denoted <sup>f</sup>↑<sup>R</sup> <sup>x</sup> , if <sup>∂</sup> ∂x <sup>f</sup>(u) <sup>≥</sup> <sup>0</sup> for all <sup>u</sup> <sup>∈</sup> <sup>R</sup><sup>3</sup>. The pMC <sup>M</sup> = (S,sI ,T,V,P) is monotonic increasing in parameter <sup>x</sup> <sup>∈</sup> <sup>V</sup> on graph-preserving region <sup>R</sup>, written M↑<sup>R</sup> <sup>x</sup> , if Pr<sup>s</sup>I→<sup>T</sup> <sup>↑</sup><sup>R</sup> x .

<sup>3</sup> To be precise, on the interior of the closed set R.

**Fig. 3.** Simple pMC that indeed is an iMC.

Monotonic decreasing, written M↓<sup>R</sup> <sup>x</sup> , is defined analogously. Let succ(s) = {s <sup>∈</sup> <sup>S</sup> | P(s, s ) = 0} be the set of direct successors of <sup>s</sup>. Given the recursive equation Pr<sup>s</sup>→<sup>T</sup> = s-<sup>∈</sup>succ(s) <sup>P</sup>(s, s ) · Pr<sup>s</sup>- <sup>→</sup><sup>T</sup> for state <sup>s</sup> <sup>=</sup> , , we have

$$\mathcal{M}\uparrow\_x^R \quad \text{iff} \quad \frac{\partial}{\partial x} \left( \sum\_{s' \in \text{succ}(s)} \mathcal{P}(s, s') \cdot \text{Pr}^{s' \to T} \right) (\vec{u}) \ge 0 \,,$$

for all <sup>u</sup> <sup>∈</sup> <sup>R</sup>. Rather than checking global monotonicity, the monotonicity checker determines a subset of the locally monotone state-parameter pairs. Such pairs intuitively capture monotonicity of a parameter only locally at a state s.

**Definition 3 (Local monotonicity).** Function Pr<sup>s</sup>→<sup>T</sup> is locally monotonic increasing in parameter <sup>x</sup> (at state <sup>s</sup>) on region <sup>R</sup>, written Pr<sup>s</sup>→<sup>T</sup> <sup>↑</sup>,R <sup>x</sup> , if

$$\forall \vec{u} \in R. \qquad \left(\sum\_{s' \in \text{succ}(s)} \left(\frac{\partial}{\partial x} \mathcal{P}(s, s')\right) \cdot \mathbb{P}r^{s' \to T}\right)(\vec{u}) \ge 0.$$

Thus, while global monotonicity considers the derivative of the entire solution function, local monotonicity (in s) only considers the derivative of the first transition (emanating from s). Local monotonicity of parameter x in every state implies global monotonicity of x, as shown in [27]. As checking global monotonicity is co-ETR hard [27], a practical approach is to check sufficient conditions for monotonicity. These conditions are based on constructing a pre-order on the states of the pMC; this is explained in detail in Section 4.

Example 2. For <sup>R</sup> <sup>=</sup> {u(p) <sup>∈</sup> [1/10, <sup>9</sup>/10]}, pMC <sup>M</sup><sup>1</sup> in Fig. 2(a) is locally monotonic increasing in p at s<sup>0</sup> and locally monotonic decreasing in p at s1. From this, we cannot conclude anything about global monotonicity of p on R. Indeed, the pMC is not globally monotonic on <sup>R</sup>. <sup>M</sup><sup>1</sup> is globally monotonic on <sup>R</sup> <sup>=</sup> {u(p) <sup>∈</sup> [1/10, <sup>1</sup>/2]}, but this cannot be concluded from the statement above. Contrarily, the pMC <sup>M</sup><sup>2</sup> in Fig. 2(c) is locally monotonic increasing in <sup>p</sup> at both s<sup>0</sup> and s1, and is therefore globally monotonic increasing in p.

#### **3.2 The Parameter Lifter**

The key idea of parameter lifting [25] is to drop all parameter dependencies parameters that occur at multiple states in a pMC—by introducing fresh parameters. The outcome is an interval Markov chain [17, 21], which can be considered a special case of pMCs in which no parameter occurs at multiple states.

**Definition 4 (Interval MC).** A pMC is a (simple) interval MC (iMC), if occur(s) <sup>∩</sup> occur(s ) = <sup>∅</sup> for all states <sup>s</sup> <sup>=</sup> <sup>s</sup> .

All iMCs in this paper are simple. We typically label transitions emanating from state s in an iMC with x = occur(s) by R(x)=[x, ux].

Example 3. The pMC in Fig. 3(a) is an iMC. For a fixed R, the typical notation is given in Fig. 3(b). For the pMC <sup>M</sup><sup>1</sup> in Fig. 2(a), the parameter <sup>p</sup> occurs at states s<sup>0</sup> and s1, so that this pMC is not an iMC.

**Definition 5 (Relaxation).** The relaxation of simple pMC <sup>M</sup>=(S,sI ,T,V,P) is the iMC relax(M)=(S, sI ,T,V ,P ) with <sup>V</sup> <sup>=</sup> {x<sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> S, occur(s) <sup>=</sup> ∅}, P (s, s ) = <sup>P</sup>(s, s )[occur(s) <sup>←</sup> <sup>x</sup>s].

For state s with occur(s) = x, let relax(R)(xs) = R(occur(s)). Likewise, an instantiation in <sup>u</sup> <sup>∈</sup> <sup>R</sup> is mapped to relax(u) by relax(u)(xs) = <sup>u</sup>(occur(s)).

Extremal reachability probabilities on iMCs are reached at the extremal values of a region. Formally [25], for each state <sup>s</sup> and region <sup>R</sup> in pMC <sup>M</sup>:

$$\max\_{\vec{u}\in R} \mathsf{Pr}\_{\mathcal{M}}^{s\to T}(\vec{u}) \; \leq \max\_{\vec{u}\in \text{relax}(R)} \mathsf{Pr}\_{\text{relax}(\mathcal{M})}^{s\to T}(\vec{u}).\tag{1}$$

This result is a direct consequence of local monotonicity at all states implying global monotonicity. The extremal values for the reachability probabilities in the obtained iMCs are obtained by interpreting the iMCs as MDPs and applying off-the-shelf MDP model checking. We denote the right-hand side of (1) as upper bound on R, denoted UR(s). Analogously we define a lower bound LR(s).

Example 4. The pMC M<sup>3</sup> in Fig. 3(a) is the relaxation of the pMC M<sup>1</sup> in Fig. 2(a). Indeed, for <sup>R</sup> <sup>=</sup> {u(p) <sup>∈</sup> [1/4, <sup>3</sup>/4]}:

$$\max\_{\vec{u}\in R} \mathsf{Pr}\_{\mathcal{M}\_1}^{s\_0 \to T}(\vec{u}) = {}^{1}/4 \le {}^{9}/16 = \max\_{\vec{u}\in \mathsf{rela}\times(R)} \mathsf{Pr}\_{\mathcal{M}\_3}^{s\_0 \to T}(\vec{u}).$$

#### **3.3 Divide and Conquer**

Figure 4 shows how the extremal value for region <sup>R</sup>ι, pMC <sup>M</sup>, reachability property ϕ and precision can be computed using only parameter lifting [25]: This paper extends this iterative approach to include monotonicity checking. The main idea is to analyze regions and split them if the result is inconclusive. The approach uses a queue of regions that need to be checked and the current extremal value CurMax found so far. In particular, we maintain a lower bound on CurMax and know a (potentially trivial) upper bound: (CurMax+ε) <sup>≥</sup> maxRˆ∈<sup>Q</sup> <sup>U</sup>R<sup>ˆ</sup>(s<sup>I</sup> ). We iteratively check regions and improve both bounds until a satisfactory solution is found. Initially, the queue only contains Rι. For a selected R from the queue we compute an upper bound U<sup>R</sup> with parameter lifting. If U<sup>R</sup> at the initial state

**Fig. 4.** Divide and conquer with pure parameter lifting

is below the current optimum, we can safely discard R. Otherwise, we want attempt to improve CurMax by guessing <sup>u</sup> <sup>∈</sup> <sup>R</sup> and computing Pr<sup>s</sup>→<sup>T</sup> <sup>M</sup> (u) using model checking4. If Pr<sup>s</sup>→<sup>T</sup> <sup>M</sup> (u) exceeds CurMax, we update CurMax. Now, we check whether we can terminate:

In particular, let the maximum so far be bounded by maxRˆ∈Q∪{R} <sup>U</sup>Rˆ(s<sup>I</sup> ). If the upper bound is below CurMax+ε, we are done, and return CurMax together with the u associated with CurMax. Otherwise, we continue and split R into smaller regions. By default, parameter lifting splits R along all dimensions. This algorithm converges in the limit [25].

Example 5. Reconsider Ex. 4, and assume we want to show maxu∈<sup>R</sup> Pr<sup>s</sup>0→<sup>T</sup> <sup>M</sup><sup>1</sup> (u) <sup>≤</sup> <sup>1</sup>/4, with ε = <sup>1</sup>/8. We sample in (the middle of) R and obtain CurMax = <sup>1</sup>/4, while the upper bound <sup>U</sup>R(sI ) from Ex. 4 is <sup>9</sup>/16. We split <sup>R</sup> into two regions <sup>R</sup><sup>1</sup> <sup>=</sup> {u(p) <sup>∈</sup> [1/4, <sup>1</sup>/2]} and <sup>R</sup><sup>2</sup> <sup>=</sup> {u(p) <sup>∈</sup> [1/2, <sup>3</sup>/4]}. Parameter lifting reveals that for both regions the bound is <sup>3</sup>/8. Thus, <sup>1</sup>/<sup>4</sup> is an epsilon-close instance.

The remainder of this paper integrates monotonicity checking in this loop.

This paper addresses **three challenges**: (Sect. 4): Using state bounds in the monotonicity checker. (Sect. 5): Using local monotonicity in parameter lifting. (Sect. 6) Integrating monotonicity in the divide and conquer loop.

### **4 A New Rule for Sufficient Monotonicity**

As discussed in Section 3.1, we aim to analyse whether for a given region R, parameter x is locally monotonic at state s. The key ingredient is a pre-order on the states of the pMC at hand that is used for checking sufficient conditions for being local monotonic. We define the pre-order and recap the "cheap" rules for efficiently determining the pre-order as adopted from [27]. We add a new, simple rule to this repertoire that lets us avoid the computationally "expensive"

<sup>4</sup> Using an *instantiation checker* that reuses model-checking results from the last guess.

rules using assumptions from [27]. The information needed to apply this new rule readily comes from parameter lifting as we will see.

Ordering states for local monotonicity. Let us consider a conceptual example showing how a pre-order on states can be used for determining local monotonicity.

Example 6. Consider the pMC M<sup>2</sup> in Fig. 2(c). We reason backwards that both states are locally monotone increasing in p. First, observe that has a higher probability to reach the target (1) than (0). Now, in s1, increasing p will move more probability mass to , and hence, it is locally monotone. Furthermore, we know that the probability from s<sup>1</sup> is between and . Now, for s<sup>0</sup> we can use that increasing p moves more probability mass to s1, which we know has a higher probability to reach the target than .

As in [27], we determine local monotonicity by ordering states according to their reachability probability.

**Definition 6 (Reachability order).** A relation (R,T <sup>⊆</sup> <sup>S</sup>×<sup>S</sup> is a reachability order with respect to <sup>T</sup> <sup>⊆</sup> <sup>S</sup> and region <sup>R</sup> if for all s, t <sup>∈</sup> <sup>S</sup>:

> <sup>s</sup> (R,T <sup>t</sup> implies <sup>∀</sup><sup>u</sup> <sup>∈</sup> R. Pr<sup>s</sup>→<sup>T</sup> (u) <sup>≤</sup> Pr<sup>t</sup>→<sup>T</sup> (u) .

The order (R,T is called exhaustive if the reverse implication also holds.

The relation (R,T is a reflexive (aka: non-strict) pre-order. The exhaustive reachability order is the union of all reachability orders, and always exists. Unless stated differently, let ( denote the exhaustive reachability order. If the successor states of a state s are ordered, we can conclude local monotonicity in s:

**Lemma 1.** Let s, s1, s<sup>2</sup> <sup>∈</sup> <sup>S</sup> with <sup>P</sup>(s, s1) = <sup>x</sup> and <sup>P</sup>(s, s2)=1−x. Then:

for each region <sup>R</sup>: <sup>s</sup><sup>2</sup> (R,T <sup>s</sup><sup>1</sup> implies Pr<sup>s</sup>→<sup>T</sup> <sup>↑</sup>,R <sup>x</sup> .

This result suggests to look for a so-called "sufficient" reachability order:

**Definition 7 (Sufficient reachability order).** A reachability order ( is sufficient for parameter <sup>x</sup> if for all states <sup>s</sup> with occur(s) = {x} and <sup>s</sup>1, s<sup>2</sup> <sup>∈</sup> succ(s) it holds: (s<sup>1</sup> ( <sup>s</sup><sup>2</sup> <sup>∨</sup> <sup>s</sup><sup>2</sup> ( <sup>s</sup>1).

Phrased differently, the reachability order ( is sufficient for <sup>x</sup> <sup>∈</sup> <sup>V</sup> if (succ(s), () is a total order for all s that have transitions labelled with x. Observe that in contrast to an exhaustive order, a sufficient order does not need to exist.

Ordering states efficiently. Def. 6 provides a conceptually simple scheme to order states s<sup>1</sup> and s2: compute the rational functions Pr<sup>s</sup>1→<sup>T</sup> and Pr<sup>s</sup>2→<sup>T</sup> , and compare them. As the size of these multivariate rational functions can be exponential in the number of parameters [16], this is not practically viable. To avoid this, [27] has identified a set of rules that provide sufficient criteria to order states. Some of these rules are conceptually based on the underlying graph of a pMC and are computationally cheap; other rules reason about (a partial representation of) the full rational function Pr<sup>s</sup>1→<sup>T</sup> and are computationally expensive.

**Fig. 5.** Non-trivial pMCs for deducing monotonicity.

Example 7. Using bounds avoids expensive rules: See M<sup>4</sup> in Fig. 5(a). Let <sup>R</sup> <sup>=</sup> {u(q) <sup>∈</sup> [1/2, <sup>3</sup>/4], u(p) <sup>∈</sup> [1/2, <sup>2</sup>/3]}. Using the solution functions <sup>p</sup><sup>2</sup> + (1−p)· <sup>q</sup> and <sup>q</sup> · (1−q) for <sup>s</sup><sup>1</sup> and <sup>s</sup><sup>2</sup> yields <sup>s</sup><sup>2</sup> ( <sup>s</sup><sup>1</sup> on <sup>R</sup>. Such a rule is expensive, but the cheaper graph-based rules analogous to Ex. 6 are not applicable. However, when we use bounds from parameter lifting, we obtain UR(s2) = <sup>3</sup>/<sup>8</sup> and LR(s1) = <sup>1</sup>/2, we observe <sup>U</sup>R(s2) <sup>≤</sup> <sup>L</sup>R(s1) and thus <sup>s</sup><sup>2</sup> ( <sup>s</sup><sup>1</sup> on <sup>R</sup>. Bounds also just simplify graph-based reasoning, in particular in the presence of cycles. Consider M5: As <sup>L</sup>R(s3) <sup>≥</sup> <sup>U</sup>R(s4), with reasoning similar to Ex. 6, it follows that <sup>s</sup><sup>2</sup> ( <sup>s</sup>1, and we immediately get results about monotonicity.

Our aim is to avoid applying the expensive rules from [27] by imposing a new and thanks to parameter lifting — cheap rule. To obtain this rule, we assume for state s and region R to have bounds LR(s) and UR(s) at our disposal satisfying

$$L\_R(s) \le \mathbb{P}r^{s \to T}(\vec{u}) \le U\_R(s) \quad \text{for all } \vec{u} \in R\text{ .}$$

Such bounds can be trivially assumed to be 0 and 1 respectively, but the idea is to obtain tighter bounds by exploiting the parameter lifter. This will be further detailed in Section 5. A simple observation on these bounds yields a cheap rule (provided these bounds can be easily obtained).

**Lemma 2.** For <sup>s</sup>1, s<sup>2</sup> <sup>∈</sup> <sup>S</sup> and region <sup>R</sup>: <sup>L</sup>R(s1) <sup>≥</sup> <sup>U</sup>R(s2) implies <sup>s</sup><sup>2</sup> (R,T <sup>s</sup>1.

In the remainder of this section, we elaborate some technical details.

Algorithmic reasoning. The pre-order ( is stored by a representation of its Hasse diagram, referred to as RO-graph. Evaluating whether two states are ordered amounts to a graph search in the RO-graph. We start off with the initial order ( . Then we attempt to apply one of the cheap rules to a state <sup>s</sup>. Lemma 2 provides us with more potential to apply a cheap rule. The typical approach is to do this in a reverse topological order over the RO-graph, such that the successors of s are already ordered as much as possible. If the successor states of s are ordered, then s can be added as a vertex and directed edges can be added between s and its successors. Otherwise, state s is added between and . This often allows for reasoning analogous to the example. To deal with strongly connected components, rules exist [27] that add states to the order even when not

all successors are in the graph. If no cheap rule can be applied, more expensive rules using the rational functions from above or SMT-solvers are applied5.

### **5 Parameter Lifting with Monotonicity Information**

Recall that our aim is to compute some <sup>λ</sup> <sup>≥</sup> maxu∈<sup>R</sup> Prs→<sup>T</sup> <sup>M</sup> (u) <sup>−</sup> <sup>ε</sup> for some fixed region <sup>R</sup>. In order to do so, we compute <sup>λ</sup> := maxu∈relax(R) Prs→<sup>T</sup> relax(M)(u) on the iMC relax(M) obtained by relaxing the pMC M. We discuss how to speed up this computation using local monotonicity information. In the remainder, let D denote relax(M) and <sup>I</sup> denote relax(R). As we consider simple iMCs, let state <sup>s</sup> with <sup>P</sup>(s, s1) = <sup>x</sup><sup>s</sup> and <sup>P</sup>(s, s2)=1−x<sup>s</sup> where the parameter <sup>x</sup><sup>s</sup> does not occur on other transitions. Assume the lower (upper) bound on x<sup>s</sup> is l<sup>s</sup> (us).

Analyzing (simple) iMCs. An iMC induces a maximal reachability bound by substituting every <sup>x</sup><sup>s</sup> with either <sup>l</sup><sup>s</sup> or <sup>u</sup>s. Formally, let <sup>V</sup>(I) denote the corner points of the interval I. Then,

$$\max\_{\vec{u}\in I} \mathsf{Pr}\_{\mathcal{D}}^{s\to T}(\vec{u}) \;= \max\_{\vec{u}\in \mathcal{V}(I)} \mathsf{Pr}\_{\mathcal{D}}^{s\to T}(\vec{u}).$$

Thus, to maximise the probability to reach T, in every state s either the lower or the upper bound of parameter <sup>x</sup><sup>s</sup> has to be chosen. This induces <sup>O</sup>(2|S<sup>|</sup> ) choices. They can be efficiently navigated by interpreting these choices as nondeterministic choices, interpreting the iMC as a Markov decision process (MDP) [25].

Local monotonicity helps. Assume local monotonicity establishes <sup>s</sup><sup>1</sup> ( <sup>s</sup>2, i.e., the reachability probability from s<sup>2</sup> is at least as high as from s1. To maximise the reachability probability from s, the parameter x<sup>s</sup> should be minimised. Contrary, if <sup>s</sup><sup>2</sup> ( <sup>s</sup>1, parameter <sup>x</sup><sup>s</sup> should be maximised. Thus, every local monotonicity result halves the amount of vertices that we are maximising over.

Example 8. Consider the iMC M<sup>3</sup> in Fig. 3(a), which is the relaxation of the pMC M<sup>1</sup> in Fig. 2(a). There are four combinations of lower and upper bounds that need to be investigated to compute the upper bound. Using local monotonicity, we deduce that q should be as low as possible and p as high as possible. Rather than evaluating a MDP, we thus obtain the same upper bound on the reachability probability in M<sup>1</sup> by evaluating a single parameter-free Markov chain.

Accelerating value iteration. Parameter lifting [25] creates a single MDP — a comparatively expensive operation — and instantiates this MDP based on the region <sup>R</sup> to be checked. For computing the bound <sup>λ</sup>, specifically, it uses value iteration. Roughly, this means that for each state we start with either its lower or upper bound. The instantiated MC is then checked. Then, all bounds that can

<sup>5</sup> In an attempt to reduce the cost of these rules, the algorithm allows for deferring proof obligations in the form of assumptions. This is detailed in [27]. For this paper, however, the only relevant aspect is that these rules are computationally expensive.

**Fig. 6.** The symbiosis of monotonicity checking and parameter lifting. Red are new elements compared to the vanilla approach in Fig. 4.

be improved by switching from lower to upper bound or vice versa are swapped. This procedure terminates with the optimal assignment to all bounds. We exploit the local monotonicity in this value iteration procedure by fixing the chosen bounds at locally monotonic states.

### **6 Lifting and Monotonocity, Together**

In this section, we give a more detailed account of our approach, i.e., we will zoom in into Fig. 1 resulting in Fig. 6. In particular, we detail the divide-and-conquer block. This loop is a refinement (indicated in red in Fig. 6) of Fig. 4. We first give an overview, before discussing some aspects in greater detail.

**Overall algorithm** The approach considers extended regions, i.e., a region R is equipped with state bounds <sup>L</sup>R(s) and <sup>U</sup>R(s) such that <sup>L</sup>R(s) <sup>≤</sup> Pr<sup>s</sup>→<sup>T</sup> <sup>M</sup> (u) <sup>≤</sup> UR(s) for every state s, and with monotonicity information about the monotonic increasing (and decreasing) parameters on R. Initially the input region R is extended with LR(s)=0, UR(s) = 1 for every s, and empty monotonicity information. Additionally, we initialize a conservative approximation for the maximum probability CurMax so far as 0. Extended regions are stored in the priority queue <sup>Q</sup> where <sup>U</sup>R(sI ) are used as priority. We discuss details below. Once initialised, we start an iterative process to update the conservative approximation of L<sup>R</sup> and UR.

First, (1) a region R and the associated reachability order stored as RO-graph is taken from the queue Q and (2) its monotonicity is computed while using the annotated bounds L<sup>R</sup> and UR. Let X<sup>R</sup> <sup>↑</sup> denote globally monotonic increasing parameters on R, and similarly, X<sup>R</sup> <sup>↓</sup> denote decreasing parameters on <sup>R</sup>. For brevity, we omit the superscript R in the following.

As a next step, we (3) shrink a region based on global monotonicity. We define the region Shrink<sup>X</sup>↑,X<sup>↓</sup> (R) as follows: Shrink<sup>X</sup>↑,X<sup>↓</sup> (R)(x) = <sup>x</sup> if <sup>x</sup> <sup>∈</sup> <sup>X</sup>↓, Shrink(R)(x) = <sup>u</sup><sup>x</sup> if <sup>x</sup> <sup>∈</sup> <sup>X</sup>↑, and Shrink(R)(x) = <sup>R</sup>(x) otherwise. In the remainder of this section, let <sup>R</sup> denote ShrinkX↑,X<sup>↓</sup> (R). Observe that we can safely discard instantiations in <sup>R</sup>\R , as maxu∈<sup>R</sup> Prs→<sup>T</sup> <sup>M</sup> (u) = maxu∈R- Prs→<sup>T</sup> <sup>M</sup> (u).

Next, we (4) analyse the region R to get bounds LR- , UR using parameter lifting and using the local monotonicity information from the monotonicity check. We make two observations: First, it holds that <sup>L</sup>R(s) <sup>≤</sup> <sup>L</sup>R- (s) and UR- (s) <sup>≤</sup> <sup>U</sup>R(s) for every <sup>s</sup>: Thus, there is no regret in analysing <sup>R</sup> rather than R. Also, consider that if all parameters are globally monotone, the region R is a singleton and straightforward to analyse.

If (5) U<sup>R</sup>- (sI ) <sup>≤</sup> CurMax, then we discard <sup>R</sup> altogether and go to (1). Otherwise, we (6) guess a candidate <sup>u</sup> <sup>∈</sup> <sup>R</sup> , and set CurMax to max(CurMax, Pr<sup>s</sup>→<sup>T</sup> <sup>M</sup> (u)). If (7) CurMax <sup>+</sup> <sup>ε</sup> <sup>≥</sup> maxRˆ∈Q∪{R-} <sup>U</sup>R<sup>ˆ</sup>(s<sup>I</sup> ), then we have solved our problem statement by returning CurMax. Otherwise, we cannot yet give a conclusive answer, and need to refine our analysis. To that end, we (8) split the region R into smaller (rectangular) regions R1,...,Rn. Note that these sub-regions first inherit the bounds of the region R ; their bounds are refined in a subsequent iteration (if any). Termination in the limit (i.e., convergence of the lower and upper bound to the limit) follows from the termination of monotonicity checking and the termination of the loop in Fig. 4.

**Incrementality** A key aspect in tuning iterative approaches is the concept of incrementality; i.e., reusing previously computed information in later computation steps. Parameter lifting is already incremental [25] by reusing the MDP structure in an efficient manner. Let us address incrementality for the monotonicity checker. Notice that all monotonicity information and all bounds that are computed for region <sup>R</sup> carry over to any <sup>R</sup><sup>ˆ</sup> <sup>⊆</sup> <sup>R</sup>. In particular, <sup>s</sup> (R,T <sup>s</sup> implies <sup>s</sup> (R,T <sup>ˆ</sup> <sup>s</sup> . Furthermore, our monotonicity checker may give up in an iteration if no cheap rules to determine monotonicity can be applied. In that case, we annotate the current reachability order such that after refining bounds, in a subsequent iteration, we can quickly check where we gave up in a last iteration, and whether refined bounds allow progress in constructing the reachability order. Notice that in principle, we have to duplicate the order for each region. However, we do this only until the monotonicity checker does not stabilize. The checker stabilizes, e.g., if an order is sufficient. Once the checker stabilized, we do not duplicate the order anymore (as no more local or global monotonicity can be deduced).

**Heuristics** Our approach allows for several choices in the implementation. Whereas the correctness of the approach does not depend on how to resolve these choices, they have a significant influence on the performance. We discuss (what we believe to be) the most important choices, and how we resolved these choices in the current implementation.

Initialising CurMax. Previously Storm was applicable only to few parameters and generously initialized CurMax by sampling all vertices <sup>V</sup>(R), which is exponential in the number of parameters. To scale to more parameters, we discard this

sampling. Instead, we sample for each parameter independently to find out which parameters are definitely not monotone. Naturally, we skip parameters already known to be monotone. We select sample points as follows. We distribute the 50 points evenly along the dimension of the parameter. All other parameter values are fixed: Non-monotonic parameters are set to their middle point in their interval (as described by the region). Monotone parameters are set at the upper (lower) bound when possibly monotone increasing (decreasing).

Updating CurMax. To prove that CurMax is close to the maximum, it is essential to find a large value for CurMax fast. In our experience, sampling at too many places within regions yields significant overhead, but taking <sup>L</sup>(sI ) is a too pessimistic way to update CurMax. To update CurMax, we select a single <sup>u</sup> <sup>∈</sup> <sup>R</sup> in the middle of region R . As we may have shrunk the region R, the middle of R does not need to coincide with the middle of R, which yields behavior different from the vanilla refinement loop.

How and where to split? There are two important splitting decisions to be made. First, we need to select the dimensions (aka: parameters) in which we split. Second, we need to decide where to split along these dimensions. We had little success with trivial attempts to split at better places, so the least informative split in the middle remains our choice for splitting. However, we have changed where (in which parameter or dimensions) to split. Naturally, we do not (need to) split in monotonic parameters. Previously, parameter lifting split in every dimension at once. Let us illustrate that this quickly becomes infeasible: Assume 10 parameters. Splitting the initial region once yields 1024 regions. Splitting half of them again yields > 500,000 regions. Instead, we use region estimates, which are heuristic values for every parameter, based on the implementation of [19]. These estimates, provided by the parameter lifter, essentially consider how well the policy on the MDP (selecting upper or lower bounds in the iMC) agrees with the dependencies induced by a parameter: The more it agrees, the lower the value. The key idea is that one obtains tighter bounds if the policy adheres to the dependencies induced by the parameters<sup>6</sup>. We split in the dimension with the largest estimate. If the region estimate is smaller than 10−4, then we split in the dimension of R with the widest interval.

Priorities in the region queue. Contrary to [25], we want to find the extremal value within the complete region, rather than partitioning the state space. Consequently, the standard technique splits based on the size of the region, and de-facto, a breadth-first search. When we split a region, we prioritize the subregions <sup>R</sup><sup>ˆ</sup> <sup>⊆</sup> <sup>R</sup> with U<sup>R</sup>- (sI ), as <sup>U</sup>R<sup>ˆ</sup>(sI ) <sup>≤</sup> <sup>U</sup><sup>R</sup>- (sI ). We use the age of a region to break ties. Here, a wild range of exploration strategies is possible. To avoid overfitting, we refrain in the experiments from weighting different aspects of the region, but the current choice is likely not the final answer.

<sup>6</sup> Technically, the value is computed as the sum of the differences between the local lower and upper bound on the reachability probability over all states with this parameter.


**Table 1.** Overview of the experimental results comparing vanilla parameter lifting to the integrated approach

Obtaining bounds for the monotonicity checker. While the baseline loop only computes upper-bounds, we use lower bounds to boost the monotonicity checking. We currently run these bounds until the monotonicity checker has stabilized. We observe that, mostly due to numerical computations, the time that the lower bounds take can be significant, but the overhead and the merits of getting larger lower bounds are hard to forecast.

### **7 Empirical Evaluation**

Setup. We investigate the performance of the extended divide-and-conquer approach presented in Fig. 6. We have implemented the algorithm explained above in the probabilistic model checker Storm [11]. We compare its performance with vanilla parameter lifting, outlined in Fig. 4, as baseline. Both versions use the same underlying data structures and version of Storm. All experiments were executed on a single core Intel Xeon Platinum 8160 CPU. We did neither use any parallel processing nor randomization. We used a time out of 1800s and a memory limit of 32GB. We exclude model-building times from all experiments and emphasize that they coincide for the vanilla and new implementations.

Benchmarks and results. The common benchmarks Crowds, BRP, and Zeroconf have only globally monotonic parameters (and only two). Using monotonicity, they become trivial. The structure of NAND and Consensus makes them not amenable to monotonicity checking, and the performance mostly resembles the baseline. We selected additional benchmarks from [2], [23], and [18], see below. The models from the latter two sources are originally formulated as partially observable MDPs and were translated into pMC using the approach in [19].

Table 1 summarizes the results for benchmarks identified by their name and instance. We list the number of states, transitions and parameters of the pMC. For each benchmark, we consider two values for ε: ε=0.05 and ε=0.1. For each ε, we consider the time t required and the number (**i**) of iterations that the integrated loop and the baseline require. For the integrated loop, we additionally provide the number (**i**b) of extra (lower bound) parameter lifting invocations needed to assist the monotonicity checker.

Discussion of the results. We make the following observations.


In general, for ε=0.1, the number of regions that need to be considered is relatively small and guessing an (almost) optimal value is not that important. This means that the results are less volatile to changes in the heuristic. For ε=0.05, it is significantly trickier to get this right. Monotonicity helps us in guessing a good initial point. Furthermore, it tells us in which parameters we should and should not split. Therefore, we prevent unnecessary splitting in some of the parameters.

### **8 Conclusion and Future Work**

This paper has presented a new technique for tackling the optimal synthesis problem: what is the instance of a parametric Markov chain that satisfies a reachability objective in an optimal manner? The key concept is a deep interplay between parameter lifting, the favourable technique so far for this problem, and monotonicity checking. Experiments showed encouraging results: speed ups of up to two orders of magnitude for various benchmarks, and an increased number of parameters. Future work consists including advanced sampling techniques and applying this approach to other application areas such as optimal synthesis and monotonicity in probabilistic graphical models [26] and hyper-properties in security [1].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (ihttps://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Inductive Synthesis for Probabilistic Programs Reaches New Horizons***-*

Roman Andriushchenko<sup>1</sup> , Milan Ceˇska ( ˇ -)<sup>1</sup> , Sebastian Junges<sup>2</sup> , and Joost-Pieter Katoen<sup>3</sup>

 Brno University of Technology, Brno, Czech Republic ceskam@fit.vutbr.cz University of California, Berkeley, USA RWTH Aachen University, Aachen, Germany

**Abstract.** This paper presents a novel method for the automated synthesis of probabilistic programs. The starting point is a program sketch representing a finite family of finite-state Markov chains with related but distinct topologies, and a reachability specification. The method builds on a novel inductive oracle that greedily generates counter-examples (CEs) for violating programs and uses them to prune the family. These CEs leverage the semantics of the family in the form of bounds on its bestand worst-case behaviour provided by a deductive oracle using an MDP abstraction. The method further monitors the performance of the synthesis and adaptively switches between inductive and deductive reasoning. Our experiments demonstrate that the novel CE construction provides a significantly faster and more effective pruning strategy leading to an accelerated synthesis process on a wide range of benchmarks. For challenging problems, such as the synthesis of decentralized partially-observable controllers, we reduce the run-time from a day to minutes.

### **1 Introduction**

Background and motivation. Controller synthesis for Markov decision processes (MDPs [35]) and temporal logic constraints is a well-understood and tractable problem, with a plethora of mature tools providing efficient solving capabilities. However, the applicability of these controllers to a variety of systems is limited: Systems may be decentralized, controllers may not be able to observe the complete system state, cost constraints may apply, and so forth. Adequate operational models for these systems exist in the form of decentralized partially-observable MDPs (DEC-POMDPs [33]). The controller synthesis problem for these models is undecidable [30], and tool support (for verification tasks) is scarce.

This paper takes a different approach: the controller together with the environment can be modelled as probabilistic program sketches where "holes" in the probabilistic program model choices that the controller may make. Conceptually, the controllers of the DEC-POMDP are described by a user-defined finite

This work has been partially supported by the Czech Science Foundation grant GJ20-02328Y and the ERC AdG Grant 787914 FRAPPANT, the NSF grants 1545126 (VeHICaL) and 1646208, by the DARPA Assured Autonomy program, by Berkeley Deep Drive, and by Toyota under the iCyPhy center.

<sup>©</sup> The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 191–209, 2021. https://doi.org/10.1007/978-3-030-72016-2 11

family M of Markov chains. The synthesis problem that we consider is to find a Markov chain <sup>M</sup> (i.e., a probabilistic program) in the family <sup>M</sup>, such that <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>, where <sup>ϕ</sup> is the specification. To allow efficient algorithms, the family must have some structure. In particular, in our setting, the family is parameterized by a set of discrete parameters <sup>K</sup>; an assignment <sup>K</sup> <sup>→</sup> <sup>V</sup> of these parameters with concrete values V from its associated domain yields a family member, i.e., a Markov chain (MC). Such a parameterization is naturally obtained from the probabilistic program sketch, where some constants (or program parts) can be left open. The search for a family member can thus be considered as the search for a hole-assignment. This approach fits within the realm of syntax-guided synthesis [2].

Motivating example. Herman's protocol [24] is a well-studied randomized distributed algorithm aimed to obtain fast stabilization on average. In [26], a family M of MCs is used to model different protocol instances. They considered each instance separately, and found which of the controllers for Herman's protocol performs best. Let us consider the protocol in a bit more detail: It considers self-stabilization of a unidirectional ring of network stations where all stations have to behave similarly—an anonymous network. Each station stores a single bit, and can read the internal bit of one (say left) neighbour. To achieve stabilization, a station for which the two legible bits coincide updates its own bit based on the outcome of a coin flip. The challenge is to select a controller that flips this coin with an optimal bias, i.e., minimizing the expected time until stabilization. In a setting where the probabilities range over 0.1, 0.2,..., 0.9, this results in analyzing nine different MCs. Does the expected time until stabilization reduce if the controllers are additionally allowed to have a single bit of memory? In every step, there are 9·9 combinations for selecting the coin flip and for each memory cell and coin flip outcome, the memory can now be updated, yielding 2·2·2 possibilities. This one-bit extension thus results in a family of 648 models. If, in addition, one allows stations to make decisions depending on the token-bits, both the coin flips and the memory updates are multiplied by a factor 4, yielding 10, 368 models. Eventually, analyzing all individual MCs is infeasible.

Oracle-guided synthesis. To tackle the synthesis problem, we introduce an oracleguided inductive synthesis approach [25,39]. A learner selects a family member and passes it to the oracle. The oracle answers whether the family member satisfies ϕ, and crucially, gives additional information in case this is not the case. Inspired by [9], if the family member violates the specification ϕ, our oracle returns a set K of parameters such that all family members obtained by changing only the values assigned to K violate ϕ. We argue that such an oracle must (1) induce little overhead in providing K , (2) be aware of the existence of parameters in the family, and (3) have (resemblance of) awareness about the semantics of the parameters and their values.

Oracles. With these requirements in mind, we construct a counterexample (CE) based oracle from scratch. We do so by carefully exploiting existing methods. We construct critical subsystems as CEs [1]. Critical subsystems are parts of

the MC that suffice to refute the specification. If a hole is absent in a CE, its value is irrelevant. To avoid the cost of finding optimal CEs—an NP-hard problem [19]—we consider greedy CEs that are similar to [9]. However, our greedy CEs are aware of the parameters, and try to limit the occurrence of parameters in the CE. Finally, to provide awareness of the semantics of parameter values, we provide lower and upper bounds on all states: Their difference indicates how much varying the value at a hole may change the overall reachability probability. These bounds are efficiently computed by another oracle. This oracle analyses a quotient MDP obtained by employing an abstraction method that is part of the abstraction-refinement loop in [10].

A hybrid variant. The two oracles are significantly different. Abstraction refinement is deductive: it argues about single family members by considering (an aggregation of) all family members. The critical subsystem oracle is inductive: by examining a single family member, it infers statements about other family members. This suggests a middle ground: a hybrid strategy monitors the performance of the two oracles during the synthesis and suggests their best usage. More precisely, the hybrid strategy integrates the counterexample-based oracle into the abstraction-refinement loop.

Major results. We present a novel and dedicated oracle deployed in an efficacious synthesis loop. We use model-checking results on an abstraction to tailor smaller CEs. Our greedy and family-aware CE construction is substantially faster than the use of optimal CEs. Together, these two improvements yield CEs that are on par with optimal CEs, but are found much faster. The integration of multiple abstraction-refinement steps yields a superior performance:x We compare our performance with the abstraction-refinement loop from [10] using benchmarks from [10]. Benchmarks can be classified along two dimensions: (A) Benchmarks with a structure good for CE-generation. (B) Benchmarks with a structure good for abstraction-refinement. A-benchmarks are a natural strength of our novel oracle. Our simple, efficient hybrid strategy significantly outperforms the state-ofthe-art on A-benchmarks, while it only yields limited overhead for B-benchmarks. Most importantly, the novel hybrid strategy can solve benchmarks that are out of reach for pure abstraction-refinement or pure CE-based reasoning. In particular, our hybrid method is able to synthesize the optimal Herman protocol with memory—the synthesis time on a design space with 3.1 millions of candidate programs reduces from a day to minutes.

**Related work** The synthesis problems for parametric probabilistic systems can be divided into the following two categories.

Topology synthesis, akin to the problem considered in this paper, assumes a finite set of parameters affecting the MC topology. Finding an instantiation satisfying a reachability property is NP-complete in the number of parameters [12], and can naively be solved by analyzing all individual family members. An alternative is to model the MC family by an MDP and resort to standard MDP modelchecking algorithms. Tools such as ProFeat [13] or QFLan [40] take this approach

to quantitatively analyze alternative designs of software product lines [21,28]. These methods are limited to small families. This motivated (1) abstractionrefinement over the MDP representation [10], and (2) counterexample-guided inductive synthesis (CEGIS) for MCs [9], mentioned earlier. The alternative problem of sketching for probabilistic programs that fit given data is studied, e.g., in [32,38].

Parameter synthesis considers models with uncertain parameters associated to transition probabilities, and analyses how the system behaviour depends on the parameter values. The most promising techniques are based on parameter lifting that treats identical parameters in different transitions independently [8,36] and has been implemented in the state-of-the-art probabilistic model checkers Storm [18] and PRISM [27]. An alternative approach based on building rational functions for the satisfaction probability has been proposed in [15] and further improved in [22,17,4]. This approach has been also applied to different problems such as model repair [5,34,11].

Both synthesis problems can be also attacked by search-based techniques that do not ensure an exhaustive exploration of the parameter space. These include evolutionary techniques [23,31] and genetic algorithms [20]. Combinations with parameter synthesis have been used [7] to synthesize robust systems.

### **2 Problem Statement**

We formalize the essential ingredients and the problem statement. See [3] for more material.

Sets of Markov chains. A (discrete) distribution over a finite set X is a function <sup>μ</sup>: <sup>S</sup> <sup>→</sup> [0, 1] s.t. <sup>x</sup> <sup>μ</sup>(x) = 1. The set Distr(X) contains all distributions over <sup>X</sup>. The support of <sup>μ</sup> <sup>∈</sup> Distr(X) is supp(μ) = {<sup>x</sup> <sup>∈</sup> <sup>X</sup> <sup>|</sup> <sup>μ</sup>(x) <sup>&</sup>gt; <sup>0</sup>}.

**Definition 1 (MC).** A Markov chain (MC) is a tuple D = (S, s0, *P* ), where <sup>S</sup> is a finite set of states, <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>S</sup> is an initial state, and *<sup>P</sup>* : <sup>S</sup> <sup>→</sup> Distr(S) is a transition probability function. We write *P* (s, t) to denote *P* (s)(t). The state s is absorbing if *P* (s, s)=1.

Let K denote a finite set of discrete parameters with finite domain Vk. For brevity, we often assume that all domains are the same, and omit the subscript <sup>k</sup>. A realization <sup>r</sup> maps parameters to values in their domain, i.e., <sup>r</sup> : <sup>K</sup> <sup>→</sup> <sup>V</sup> . Let <sup>R</sup><sup>D</sup> denote the set of all realizations of a set <sup>D</sup> of MCs. A <sup>K</sup>-parameterized set of MCs <sup>D</sup>(K) contains the MCs <sup>D</sup>r, for every <sup>r</sup> ∈ R<sup>D</sup>. In Sect. 3, we give an operational model for such sets. In particular, realizations will fix the targets of transitions. In our experiments, we describe these sets using the PRISM modelling language where parameters are described by undefined integer values.

Properties and specifications. For simplicity, we consider (unbounded) reachability properties<sup>1</sup>. For a set <sup>T</sup> <sup>⊆</sup> <sup>S</sup> of target states, let <sup>P</sup>[D, s <sup>|</sup><sup>=</sup> ♦T] denote

<sup>1</sup> Our implementation also supports expected reachability rewards.

the probability in MC D to eventually reach some state in T when starting in the state <sup>s</sup> <sup>∈</sup> <sup>S</sup>. A property <sup>ϕ</sup> <sup>≡</sup> <sup>P</sup>λ[♦T] with <sup>λ</sup> <sup>∈</sup> [0, 1] and ∈ {≤, ≥} expresses that the probability to reach T does relate to λ according to . If <sup>=</sup> <sup>≤</sup>, then <sup>ϕ</sup> is a safety property; otherwise, it is a liveness property. Formally, state <sup>s</sup> in MC <sup>D</sup> satisfies <sup>ϕ</sup> if <sup>P</sup>[D, s <sup>|</sup><sup>=</sup> ♦T] <sup>≥</sup> <sup>λ</sup>. The MC <sup>D</sup> satisfies <sup>ϕ</sup> if the above holds for its initial state. A specification is a set of properties <sup>Φ</sup> <sup>=</sup> {ϕi}i∈<sup>I</sup> , and <sup>D</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup> if <sup>∀</sup><sup>i</sup> <sup>∈</sup> <sup>I</sup> : <sup>D</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>i.

Problem statement. The key problem statement in this paper is feasibility:

Given a parameterized set of Markov chains <sup>D</sup>(K) over parameters <sup>K</sup> and a specification <sup>Φ</sup>, find a realization <sup>r</sup> : <sup>K</sup> <sup>→</sup> <sup>V</sup> such that <sup>D</sup><sup>r</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup>.

When <sup>D</sup> is clear from the context, we often write <sup>r</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup> to denote <sup>D</sup><sup>r</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup>.

We additionally consider the optimizing variant of the synthesis problem. The maximal synthesis problem asks: given a maximizing property <sup>ϕ</sup>max <sup>≡</sup> <sup>P</sup>λ[♦T], identify <sup>r</sup><sup>∗</sup> <sup>∈</sup> arg max<sup>r</sup>∈R<sup>D</sup> {P[D<sup>r</sup> <sup>|</sup><sup>=</sup> ♦T] | D<sup>r</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup>} provided it exists. The minimal synthesis problem is defined analogously.

As the state space S, the set K of parameters, and their domains are all finite, the above synthesis problems are decidable. One possible solution, called the one-by-one approach [14], considers each realization <sup>r</sup> ∈ R<sup>D</sup>. The state-space and parameter-space explosion renders this approach unusable for large problems, necessitating the usage of advanced techniques that exploit the family structure.

### **3 Counterexample-Guided Inductive Synthesis**

In this section, we recap a baseline for a counterexample-guided inductive synthesis (CEGIS) loop, as put forward in [9]. In particular, we first instantiate an oracle-guided synthesis method, discuss an operational model for families, giving structure to the parameterized set of Markov chains, and finally detail the usage of CEs to create an oracle.

Consider Fig. 1. A learner takes a set R of realizations, and has to find a realization D<sup>r</sup> satisfying the specification Φ. The learner maintains (a symbolic representation of) a set <sup>Q</sup> ⊆ R of realizations that need to be checked. It iteratively asks the oracle whether a particular <sup>r</sup> <sup>∈</sup> <sup>Q</sup> is a solution. If it is a solution, the oracle reports success.

Otherwise, the oracle returns a set <sup>R</sup> containing <sup>r</sup> and potentially more realizations all violating <sup>Φ</sup>. The learner then prunes <sup>R</sup> from <sup>Q</sup>. In Section 4, we focus on creating an efficient oracle that computes a set <sup>R</sup> (with <sup>r</sup> ∈ R ) of realizations that are all violating Φ. In Section 5, we provide a more advanced framework that extends this method. The remainder of this section lays the groundwork for these sections.

**Families of Markov chains** To avoid the need to iterate over all realizations, an efficient oracle exploits some structure of the family. In this paper, we focus on sets of Markov chains having different topologies. We explain our concepts using the operational model of families given in [10]. Our implementation supports (more expressive) PRISM programs with undefined integer constants.

**Definition 2 (Family of MCs).** <sup>A</sup> family of MCs is a tuple <sup>D</sup> = (S, s0, K, <sup>B</sup>) with <sup>S</sup> and <sup>s</sup><sup>0</sup> as before, <sup>K</sup> is a finite set of parameters with domains <sup>V</sup><sup>k</sup> <sup>⊆</sup> <sup>S</sup> for each <sup>k</sup> <sup>∈</sup> <sup>K</sup>, and <sup>B</sup> : <sup>S</sup> <sup>→</sup> Distr(K) is a family of transition probability functions.

Function B of a family D of MCs maps each state to a distribution over parameters K. In the context of the synthesis of probabilistic models, these parameters represent unknown options or features of a system under design. Realizations are now defined as follows.

**Definition 3 (Realization).** <sup>A</sup> realization of a family <sup>D</sup> = (S, s0, K, <sup>B</sup>) of MCs is a function <sup>r</sup> : <sup>K</sup> <sup>→</sup> <sup>S</sup> s.t. <sup>r</sup>(k) <sup>∈</sup> <sup>V</sup>k, for all <sup>k</sup> <sup>∈</sup> <sup>K</sup>. We say that realization <sup>r</sup> induces MC <sup>D</sup><sup>r</sup> = (S, s0, <sup>B</sup>r) iff <sup>B</sup>r(s, s ) = k∈K,r(k)=s- <sup>B</sup>(s)(k) for any pair of states s, s <sup>∈</sup> <sup>S</sup>. The set of all realizations of <sup>D</sup> is denoted as <sup>R</sup><sup>D</sup>.

The set R<sup>D</sup> = <sup>k</sup>∈<sup>K</sup> <sup>V</sup><sup>k</sup> of all possible realizations is exponential in <sup>|</sup>K|.

**Counterexample-guided oracles** We first consider the feasibility synthesis for a single-property specification and later, cf. Remark 1, generalize this to multiple properties and to optimal synthesis. The notion of counterexamples is at the heart of the oracle from [9] and Sect. 4.

If an MC <sup>D</sup> |<sup>=</sup> <sup>ϕ</sup>, a counterexample (CE) based on a critical subsystem can serve as diagnostic information about the source of the failure. We consider the following CE, motivated by the notion of critical subsystem in [37].

**Definition 4 (Counterexample).** Let <sup>D</sup> = (S, s0, *<sup>P</sup>* ) be an MC with <sup>s</sup><sup>⊥</sup> ∈ <sup>S</sup>. The sub-MC of <sup>D</sup> induced by <sup>C</sup> <sup>⊆</sup> <sup>S</sup> is the MC <sup>D</sup>↓<sup>C</sup> = (<sup>S</sup> ∪ {s⊥}, s0, *<sup>P</sup>* ), where the transition probability function *P* is defined by:

$$\mathcal{P}'(s) = \begin{cases} \mathcal{P}(s) & \text{if } s \in C, \\ \left[ s\_\perp \mapsto 1 \right] & \text{otherwise.} \end{cases}$$

The set <sup>C</sup> and the sub-MC <sup>D</sup>↓<sup>C</sup> are called a counterexample (CE) for the property <sup>P</sup>≤<sup>λ</sup>[♦T] on MC <sup>D</sup>, if <sup>D</sup>↓<sup>C</sup> |<sup>=</sup> <sup>P</sup>≤<sup>λ</sup>[♦(<sup>T</sup> <sup>∩</sup> (<sup>C</sup> ∪ {s0}))].

Let <sup>D</sup><sup>r</sup> be an MC violating the specification <sup>ϕ</sup>. To compute other realizations violating <sup>ϕ</sup>, the oracle computes a critical subsystem <sup>D</sup>r↓C, which is then used to deduce a so-called conflict for <sup>D</sup><sup>r</sup> and <sup>ϕ</sup>.

**Definition 5 (Conflict).** For family of MCs <sup>D</sup> = (S, s0, K, <sup>B</sup>) and <sup>C</sup> <sup>⊆</sup> <sup>S</sup>, the set K<sup>C</sup> of relevant parameters (called conflict) is given by <sup>s</sup>∈<sup>C</sup> supp(B(s)).

**Fig. 2.** Counterexamples for smaller conflicts.

It is straightforward to compute a set of violating realizations from a conflict. A generalization of realization <sup>r</sup> induced by the set <sup>K</sup><sup>C</sup> <sup>⊆</sup> <sup>K</sup> of relevant parameters is the set <sup>r</sup>↑K<sup>C</sup> <sup>=</sup> {r ∈R|∀<sup>k</sup> <sup>∈</sup> <sup>K</sup><sup>C</sup> : <sup>r</sup>(k) = <sup>r</sup> (k)}. We often use the term conflict to refer to its generalization. The size of a conflict, i.e., the number <sup>|</sup>K<sup>C</sup> <sup>|</sup> of relevant parameters <sup>K</sup><sup>C</sup> is crucial. Small conflicts potentially lead to generalizing <sup>r</sup> to larger subfamilies <sup>r</sup>↑K<sup>C</sup> . It is thus important that the CEs contain as few parameterized transitions as possible. The size of a CE in terms of the number of states is not of interest. Furthermore, the overhead of providing CEs should be bounded from below by the payoff: Finding a large generalization may take some time, but small generalizations should be returned quickly. The CE-based oracle in [9] uses an off-the-shelf CE procedure [16,41], and mostly does not provide small CEs.

### **4 A Smart Oracle with Counterexamples and Abstraction**

This section develops an oracle based on CEs, tailored for the use in an oracleguided inductive synthesis loop described in Sect. 3. Its main features are:


Before going into details, we provide some illustrative examples.

**A motivating example** First, we illustrate what it means to take CEs that lead to small conflicts. Consider Fig. 2, with a family member D<sup>r</sup> (left), where the superscript of a state identifier s<sup>i</sup> denotes parameters relevant to si. Consider the safety property <sup>ϕ</sup> <sup>≡</sup> <sup>P</sup>≤0.<sup>4</sup>[♦{t}]. Clearly, <sup>D</sup><sup>r</sup> |<sup>=</sup> <sup>ϕ</sup>, and we can construct two CEs: <sup>C</sup><sup>1</sup> <sup>=</sup> {s0, s3, t} (center) and <sup>C</sup><sup>2</sup> <sup>=</sup> {s0, s1, s2, t} (right) with conflicts <sup>K</sup><sup>C</sup><sup>1</sup> <sup>=</sup> {X, Y } and <sup>K</sup><sup>C</sup><sup>2</sup> <sup>=</sup> {X}, respectively. It illustrates that a smaller CE does not necessarily induce a smaller conflict.

We now illustrate awareness of the semantics of parameters. Consider the family <sup>D</sup> = (S, s0, K , <sup>B</sup>), where <sup>S</sup> <sup>=</sup> {s0, s1, s2, t, f}, the parameters are <sup>K</sup> <sup>=</sup> {X, Y, T , F } with domains <sup>V</sup><sup>X</sup> <sup>=</sup> {s1, s2}, <sup>V</sup><sup>Y</sup> <sup>=</sup> {t, f}, <sup>V</sup><sup>T</sup>- <sup>=</sup> {t}, <sup>V</sup><sup>F</sup> - <sup>=</sup> {f}, and a family B of transition probability functions defined in Fig. 3 (left). As the

B(s0)=[X → 1], B(s1)=[T → 0.6, Y → 0.2, F → 0.2], B(s2)=[T → 0.2, Y → 0.2, F → 0.6], B(t)=[T → 1], B(f)=[F → 1]

**Fig. 3.** A family D of four Markov chains (unreachable states are grayed out).

parameters <sup>T</sup> and <sup>F</sup> each can take only one value, we consider <sup>K</sup> <sup>=</sup> {X, Y } as the set of parameters. There are <sup>|</sup>VX|×|V<sup>Y</sup> <sup>|</sup> = 4 family members, depicted in Fig. 3(right). For conciseness, we omit some of the transition probabilities (recall that transition probabilities sum to one). Only realization r<sup>3</sup> satisfies the safety property <sup>ϕ</sup> <sup>≡</sup> <sup>P</sup>≤0.3[♦{t}].

CEGIS [9] illustrated: Consider running CEGIS, and assume the oracle gets realization <sup>r</sup><sup>0</sup> first. A model checker reveals <sup>P</sup>[D<sup>r</sup><sup>0</sup> , s<sup>0</sup> <sup>|</sup><sup>=</sup> ♦T]=0.<sup>8</sup> <sup>&</sup>gt; <sup>0</sup>.3. The CE for <sup>D</sup><sup>r</sup><sup>0</sup> and <sup>ϕ</sup> contains the (only) path to the target: <sup>s</sup>0→s1<sup>→</sup> <sup>t</sup> having probability 0.<sup>8</sup> <sup>&</sup>gt; <sup>0</sup>.3. The corresponding CE <sup>C</sup> <sup>=</sup> {s0, s1, t} induces the conflict <sup>K</sup><sup>C</sup> <sup>=</sup> {X, Y }. None of the parameters is generalized. The same argument applies to any subsequent realization: the constructed CEs do not allow for generalization, the oracle returns only the passed realization, and the learner keeps iterating until accidentally guessing r3.

Can we do better? To answer this, consider CE generation as a game: The Pruner creates a critical subsystem C. The Adversary wins if it finds a MC satisfying ϕ containing C, thus refuting that C is a counterexample. In our setting, we change the game: The Adversary must select a family member rather than an arbitrary MC. Analogously, off-the-shelf CE generators construct a critical subsystem C that for every possible extension of C is a CE. These are CEs without context. In our game, the Adversary may not extend the MC arbitrarily, but must choose a family member. These are CEs modulo a family.

Back to the example: Observe that for a CE for <sup>D</sup><sup>r</sup><sup>0</sup> , we could omit states <sup>t</sup> and <sup>s</sup><sup>1</sup> from the set <sup>C</sup> of critical states: we know for sure that, once <sup>D</sup><sup>r</sup><sup>0</sup> takes transition (s0, s1), it will reach target state t with probability at least 0.6. This exceeds the threshold 0.3, regardless of the value of the parameter Y . Hence, for family <sup>D</sup>, the set <sup>C</sup> <sup>=</sup> {s0} is a critical subsystem. The immediate advantage is that this set induces conflict K<sup>C</sup>- <sup>=</sup> {X} (parameter <sup>Y</sup> has been generalized). This enables us to reject all realizations from the set <sup>r</sup>0↑K<sup>C</sup>- <sup>=</sup> {r0, r1}. It is 'easier' to construct a CE for a (sub)family than for arbitrary MCs. More generally, a successful oracle needs to have access to useful bounds, and effectively integrate them in the CE generation.

**Counterexample construction** We develop an algorithm using bounds on reachability probabilities, similar to the bounds used above. Let us assume that for some set of realizations <sup>R</sup> and for every state <sup>s</sup>, we have bounds *lb*R(s), *ub*R(s), such that for every <sup>r</sup> ∈ R we have *lb*R(s) <sup>≤</sup> <sup>P</sup>[Dr, s <sup>|</sup><sup>=</sup> ♦T] <sup>≤</sup> *ub*R(s). Such bounds always exist (take 0 and 1). We see later how we compute these bounds. In what follows, we fix <sup>r</sup> and denote <sup>D</sup><sup>r</sup> = (S, s0, *<sup>P</sup>* ). Let us assume <sup>D</sup><sup>r</sup> violates a safety property <sup>ϕ</sup> <sup>≡</sup> <sup>P</sup>≤λ[♦T]. The following definition is central:

**Definition 6 (Rerouting).** Let MC <sup>D</sup> = (S, s0, *<sup>P</sup>* ) with <sup>s</sup>, s<sup>⊥</sup> ∈ <sup>S</sup>, <sup>C</sup> <sup>⊆</sup> <sup>S</sup> a set of expanded states and *<sup>γ</sup>* : <sup>S</sup> \ <sup>C</sup> <sup>→</sup> [0, 1] <sup>a</sup> rerouting vector. The rerouting of MC <sup>D</sup> w.r.t. <sup>C</sup> and *<sup>γ</sup>* is the MC <sup>D</sup>↓C[*γ*]=(<sup>S</sup> ∪ {s⊥, s}, s0, *<sup>P</sup>* <sup>C</sup> *<sup>γ</sup>* ) with:

$$\mathcal{P}^C\_\gamma(s) = \begin{cases} \mathcal{P}(s) & \text{if } s \in C, \\ [s\top \mapsto \gamma(s), s\_\perp \mapsto (1 - \gamma(s))] & \text{if } s \in S\backslash C, \\ [s \mapsto 1] & \text{if } s \in \{s\gamma, s\_\perp\}. \end{cases}$$

Essentially, <sup>D</sup>↓C[*γ*] extends the MC <sup>D</sup> with additional sink states <sup>s</sup> and <sup>s</sup><sup>⊥</sup> and replaces all outgoing transitions of any non-expanded state <sup>s</sup> <sup>∈</sup> <sup>S</sup>\<sup>C</sup> by a transition leading to <sup>s</sup> (with probability *<sup>γ</sup>*(s)) and a complementary one to <sup>s</sup>⊥. We consider <sup>s</sup> to be the new target and let <sup>ϕ</sup> denote the updated property. The transition s *γ*(s) −−−→ <sup>s</sup> may be considered a 'shortcut' that by-passes successors of <sup>s</sup> and leads straight to target <sup>s</sup> with probability *<sup>γ</sup>*(s). To ensure that <sup>D</sup>↓C[*γ*] is a CE, the value *γ*(s) must be a lower bound on the reachability probability from s in D. When constructing a CE for a singular MC, we pick *γ* = **0**, whereas when this MC is induced by a realization <sup>r</sup> ∈ R, we can safely pick *<sup>γ</sup>* <sup>=</sup> *lb*R. The CE will be valid for every <sup>r</sup> ∈ R. It is a CE-modulo-R.

Algorithmically, we employ a state-exploration approach and therefore start with <sup>C</sup>(0) <sup>=</sup> <sup>∅</sup>, i.e., all states are initially rerouted. If this is a CE, we are done. Otherwise, if the rerouting <sup>D</sup>↓C(0)[*γ*] satisfies <sup>ϕ</sup> , then we 'expand' some states to obtain a CE. Naturally, we must expand reachable states to change the satisfaction of <sup>ϕ</sup>. By expanding some state <sup>s</sup> <sup>∈</sup> <sup>S</sup>, we abandon the abstraction associated with the shortcut s *γ*(s) −−−→ <sup>s</sup> and replace it with concrete behavior that was inherent to state s in MC D. Expanding a state cannot decrease the induced reachability probability as *lb*<sup>R</sup> is a valid lower bound. This gradual expansion of the reachable state space continues until for some <sup>C</sup> <sup>⊆</sup> <sup>S</sup> the corresponding rerouting <sup>D</sup>↓C[*γ*] violates <sup>ϕ</sup> . This gradual expansion process terminates as <sup>D</sup>↓S[*γ*] <sup>≡</sup> <sup>D</sup> and our assumption is <sup>D</sup> |<sup>=</sup> <sup>ϕ</sup>. We show this process on an example.

Example 1. Reconsider <sup>D</sup> in Fig. <sup>3</sup> with <sup>ϕ</sup> <sup>≡</sup> <sup>P</sup>≤0.<sup>3</sup>[♦{t}]. Using the method outlined below we get: *lb*<sup>R</sup> = [s<sup>0</sup> <sup>→</sup> <sup>0</sup>.2, s<sup>1</sup> <sup>→</sup> <sup>0</sup>.6, s<sup>2</sup> <sup>→</sup> <sup>0</sup>.2, t <sup>→</sup> <sup>1</sup>, f <sup>→</sup> 0]. In absence of any bounds, the CE is {s0, s1, t}. Consider the gradual rerouting approach: We set *<sup>γ</sup>* <sup>=</sup> *lb*R, <sup>C</sup>(0) <sup>=</sup> <sup>∅</sup> and have <sup>D</sup>(0) := <sup>D</sup><sup>r</sup><sup>0</sup> <sup>↓</sup>C(0)[*γ*], see Fig. 4(a). Verifying this MC against <sup>ϕ</sup> <sup>=</sup> <sup>P</sup>≤0.3[♦<sup>T</sup> ∪{s}] yields <sup>P</sup>[D(0), s<sup>0</sup> <sup>|</sup><sup>=</sup> ♦<sup>T</sup> ∪{s}] = *<sup>γ</sup>*(s0)=0.<sup>2</sup> <sup>≤</sup> <sup>0</sup>.3, i.e., the set <sup>C</sup>(0) is not a CE. We now expand the initial state, i.e., <sup>C</sup>(1) <sup>=</sup> {s0} and let <sup>D</sup>(1) := <sup>D</sup><sup>r</sup><sup>0</sup> <sup>↓</sup>C(1)[*γ*], see Fig. 4(b). Verifying <sup>D</sup>(1) yields <sup>P</sup>[D(1), s<sup>0</sup> <sup>|</sup><sup>=</sup> ♦<sup>T</sup> ∪ {s}]=1 · *<sup>γ</sup>*(s1)=0.<sup>6</sup> <sup>&</sup>gt; <sup>0</sup>.3. Thus, the set <sup>C</sup>(1) is critical

**Fig. 4.** Finding a CE to D<sup>r</sup><sup>0</sup> and ϕ from Fig. 3 using the rerouting vector *γ* = *lb*R.


and the corresponding conflict is <sup>K</sup>C(1) <sup>=</sup> supp(s0) = {X}. This is smaller than the naively computed conflict {X, Y }.

**Greedy state expansion strategy** Recall from Fig. 2 that for an MC D<sup>r</sup> with <sup>D</sup><sup>r</sup> |<sup>=</sup> <sup>ϕ</sup>, multiple CEs may exist inducing different conflicts. An efficient expansion strategy should yield a CE that induces a small amount of relevant parameters (to prune more family members) and this CE is preferably obtained by a small number of model-checking queries. The method presented in Alg. 1 meets these criteria. The algorithm expands multiple states between subsequent model checks, while expanding only states that are associated with parameters that are relevant. In particular, in each iteration, we keep track of the set K(i) of relevant parameters optimistically starting with <sup>K</sup>(0) <sup>=</sup> <sup>∅</sup>. We compute (see line 3) the set C(i) of states that are reachable from the initial state via states which are associated only with relevant parameters in K(i), i.e., via states for which supp(B(s)) <sup>⊆</sup> <sup>K</sup>(i). Here, <sup>H</sup>(i) represents a state exploration 'horizon': the set of states reachable from C(i) but containing some (still) irrelevant parameters. We then construct the corresponding rerouting <sup>D</sup>↓C(i) [*γ*] and check whether it is a CE. Otherwise, we greedily choose a state s from the horizon H(i) containing the least number of irrelevant parameters and add these parameters to our

**Fig. 5.** Conceptual hybrid (dual-oracle) synthesis .

conflict (see line 7). The resulting conflict may not be minimal, but is computed fast. Our algorithm applies to probabilistic liveness properties<sup>2</sup> too using *γ* = *ub*R.

**Computing bounds** We compute *lb*<sup>R</sup> and *ub*<sup>R</sup> using an abstraction [10]. The method considers some set R of realizations and computes the corresponding quotient Markov decision process (MDP) that over-approximates the behavior of all MCs in the family R. Model checking this MDP yields an upper and a lower bound of the induced probabilities for all states over all realizations in R. That is, *Bound*(D, <sup>R</sup>) computes *lb*<sup>R</sup> <sup>∈</sup> <sup>R</sup><sup>S</sup> and *ub*<sup>R</sup> <sup>∈</sup> <sup>R</sup><sup>S</sup> such that for each <sup>s</sup> <sup>∈</sup> <sup>S</sup>:

$$\text{alb}^{\mathcal{R}}(s) \le \min\_{r \in \mathcal{R}} \mathbb{P}[\mathcal{D}\_r, s \mid = \Diamond T] \le \max\_{r \in \mathcal{R}} \mathbb{P}[\mathcal{D}\_r, s \mid = \Diamond T] \le \text{ub}^{\mathcal{R}}(s).$$

To allow for refinement, two properties are crucial (with point-wise inequalities):

$$1. \text{ lb}^{\mathcal{R}} \le \text{lb}^{\mathcal{R}'} \land \text{ub}^{\mathcal{R}} \ge \text{ub}^{\mathcal{R}'} \text{ for } \mathcal{R}' \subseteq \mathcal{R} \quad \text{and} \quad 2. \text{ lb}^{\{r\}} = \text{ub}^{\{r\}} \text{ for } r \in \mathcal{R}.$$

In [10], the abstraction and refinement together define an abstraction-refinement loop (AR) that addresses the feasibility problem. In the worst case, this loop analyses 2 · |R| quotient MDPs, which (as of now) may be arbitrarily larger than the number of family members they represent.

### **5 Hybrid Dual-Oracle Synthesis**

We introduce an extended synthesis loop in which the abstraction-based reasoning is used to prune the family R, and to accelerate the CE-based oracle from Sect. 4. The intuitive idea is outlined in Fig. 5. Note that if the CE-based oracle is not exploited, we emulate AR (explained in computing bounds above), whereas if the abstraction oracle is not used, we emulate CEGIS (with the novel oracle).

Let us motivate combining these oracles in a flexible way. The naive version outlined in the previous section assumed a single abstraction step, and invokes CEGIS with the bounds obtained from that step. Evidently, the better (tighter) the bounds *γ*, the better the CEs. However, the abstraction-based bounds for R may be very loose. These bounds can be improved by splitting the set R and using the bounds on the two sub-families. The idea is to run a limited number of

<sup>2</sup> Some care is required regarding loops, see [9].


**Input :** A family D, a reachability property ϕ. **Output :** Either a member r in D with r |= ϕ, or no such r exists in D **<sup>1</sup>** R ← {R<sup>D</sup>} ; // each analysed (sub-)family also holds bounds **<sup>2</sup>** δCEGIS ← 1 ; // time allocation factor for CEGIS **3 while** *true* **do <sup>4</sup>** result,R , σAR, tAR ←AR.run(R, ϕ) **5 if** result.decided() **then return** result; **<sup>6</sup>** CEGIS.setTimeout(tAR · δCEGIS) **<sup>7</sup>** result, σCEGIS, <sup>R</sup> <sup>←</sup> CEGIS.run(R , ϕ) **8 if** result.decided() **then return** result; **<sup>9</sup>** δCEGIS ← σCEGIS/σAR **<sup>10</sup>** R ← <sup>R</sup> **11 end while**

AR steps and then invoke CEGIS. Our experiments reveal that it can be crucial to be adaptive, i.e., the integrated method must be able to detect at run time when to switch.

The proposed hybrid method switches between AR and CEGIS, where we allow for refining during the AR phase and use the obtained refined bounds during CEGIS. Additionally, we estimate the efficiency σ (e.g., the number of pruned MCs per time unit) of the two methods and allocate more time t to the method with superior performance. That is, if we detect that CEGIS prunes sub-families twice as fast as AR, we double the time in the next round for CEGIS. The resulting algorithm is summarized in Alg. 2. Recall that AR (at line 5) takes one family from R, either solves it or splits it and returns the set of undecided families <sup>R</sup> . In contrast, CEGIS processes multiple families from <sup>R</sup> until the timeout and then returns the set of undecided families <sup>R</sup>. This workflow is motivated by the fact that one iteration of AR (i.e., the involved MDP model-checking) is typically significantly slower that one CEGIS iteration.

Remark 1. Although the developed framework for integrated synthesis has been discussed in the context of feasibility with respect to a single property ϕ, it can be easily generalized to handle multiple-property specifications as well as to treat optimal synthesis. Regarding multiple properties, the idea remains the same: Analyzing the quotient MDP with respect to multiple properties yields multiple probability bounds. After initiating a CEGIS-loop and obtaining an unsatisfiable realization, we can construct a separate conflict for each unsatisfied property, while using the corresponding probability bound to enhance the CE generation process. Optimal synthesis is handled similarly to feasibility, but, after obtaining a satisfiable solution, we update the optimizing property to exclude this solution: e.g., for maximal synthesis this translates to increasing the threshold of the maximizing property. Having exhausted the search space of family members, the last obtained solution is declared to be the optimal one.


**Table 1.** Summary of the benchmarks and their statistics

### **6 Experimental evaluation**

Implementation. We implemented the hybrid oracle on top of the probabilistic model checker Storm [18]. While the high-performance parts were implemented in C++, we used a python API to flexibly construct the overall synthesis loop. For SMT solving, we used Z3 [29]. The tool chain takes a PRISM [27] or JANI [6] sketch and a set of temporal properties, and returns a satisfying realization, if such exists, or outputs that such realization does not exist. The implementation in the form of an artefact is available at https://zenodo.org/record/4422543.

Set-up. We compare the adaptive oracle-guided synthesis with two state-of-the-art synthesis methods: program-level CEGIS [9] using a MaxSat CE generation [16,41] and AR [10]. These use the same architecture and data structures from Storm. All experiments are run on an Ubuntu 19.04 machine with Intel i5-8300H (4 cores at 2.3 GHz) and using up to 8 GB RAM, with all the algorithms being executed on a single thread. The benchmarks consists of five different models, see Table 1, from various domains that were used in [9,10]. As opposed to the benchmark considered in [9,10], we use larger variants of Grid and Herman to better demonstrate differences in the performance of individual methods.

To investigate the scalability of the methods, we consider a new variant of the Herman model, that allows us to scale the number of randomization strategies and thus the family size. In particular, we will compare performance on two instances of different sizes: small Herman<sup>∗</sup> (5k members) and large Herman<sup>∗</sup> (3.1M members, other statistics are reported in Table 1).

To reason about the pruning efficiency of different synthesis methods, we want to avoid feasible synthesis problems, where the order of family exploration can lead to inconsistent performance. Instead, we will primarily focus on nonfeasible problems, where all realizations need to be explored in order to prove unsatisfiability. The experimental evaluation is presented in three parts. (1) We evaluate the novel CE construction method and compare it with the MaxSat-based oracle from [9]. (2) We compare the hybrid synthesis loop with the two baselines AR and CEGIS. (3) We consider novel hard synthesis instances (multi-property synthesis, finding optimal programs) on instances of the model Herman∗.

**Comparing CE construction methods** We consider the quality of the CEs and their generation time. In particular, we want to investigate (1) whether using CEs-modulo-families yields better CEes, (2) how the quality of CEs from the smart oracle compares to the MaxSat-based oracle, and how their time consumption compares. As a measure of quality of a CE, the average number of its relevant parameters w.r.t. the total number of its parameters is taken. That is, smaller


**Table 2.** CE quality for different methods and performance of three synthesis methods. For each model/property, we report results for two different thresholds where the symbol '∗' marks the one closer to the feasibility threshold, representing the more difficult synthesis problem. Symbol '-' marks a two-hour timeout. **CE quality**: The presented numbers give the CE quality (i.e., the smaller, the better). The numbers in parentheses represent the average run-time of constructing one CE in seconds (run-times for constructing CE using non-trivial bounds are similar as for trivial ones and are thus not reported). **Performance**: for each method, we report the number of iterations (for the hybrid method, the reported values are iterations of the CEGIS and AR oracle, respectively) and the run-time in seconds.

ratios imply better CEs. To measure the influence of using CEs-modulo-families, two types of bounds are used: (i) trivial bounds (i.e., *γ* = **0** for safety and *γ* = **1** for liveness properties), and (ii) non-trivial bounds corresponding to the entire family R<sup>D</sup> representing the most conservative estimate. The results are reported in (the left part of) Table 2. In the next subsection, we investigate this same benchmark from the point of view of the performance of the synthesis methods, which also shows the immediate effect of the new CE generation strategy.

The first observation is that using non-trivial bounds (as opposed to trivial ones) for the state expansion approach can drastically decrease the conflict size. It turns out that the CEs obtained using the greedy approach are mostly larger than those obtained with the MaxSat method. However (see Grid), even for trivial bounds, we may obtain smaller CEs than for MaxSat: computing a minimal-command CE does not necessarily induce an optimal conflict. On the other hand, comparing the run-times in the parentheses, one can see that computing CEs via the greedy state expansion is orders of magnitude faster than computing command-optimal ones using MaxSat. It is good to realize that the greedy method makes at most <sup>|</sup>K<sup>|</sup> model-checking queries to compute CEs, while the MaxSat method may make exponentially many such queries. Overall, the greedy method using the non-trivial bounds is able to obtain CEs of comparable quality as the MaxSat method, while being orders of magnitude faster.

**Performance comparison with AR/CEGIS** We compare the hybrid synthesis loop from Sect. 5 with two state-of-the-art baselines: CEGIS and AR. The results are displayed in (the right half of) Table 2. In all 10 cases, the hybrid method outperforms the baselines. It is up to an order of magnitude faster.

Let us discuss the performance of the hybrid method. We classify benchmarks along two dimensions: (1) the performance of CEGIS and (2) the performance of AR. Based on the empirical performance, we classify (Grid) as good-for-CEGIS (and not for AR), Maze, Pole and DPM as good-for-AR (and not for CEGIS), and Herman as hard (for both). Roughly, AR works well when the quotient MDP does not blow up and its analysis is precise due to consistent schedulers, i.e., when the parameter dependencies are not crucial for a precise analysis. CEGIS performs well when the CEs are small and fast to compute. On the other hand, synthesis problems for which neither pure CEGIS nor pure AR are able to effectively reason about non-trivial subfamilies, inherently profit from a hybrid method. The main point we want to discuss is how the hybrid method reinforces the strengths of both methods, rather than their weaknesses.

In the hybrid method, there are two factors that determine the efficiency: (i) how fast do we get bounds on the reachability probability that are tight enough to enable construction of good counterexamples? and (ii) how good are the constructed counterexamples? The former factor is attributed to the proposed adaptive scheme (see Alg. 2), where the method will prefer AR-like analysis and continue refinement until the computed bounds allow construction of small counterexamples. The latter is reflected above. Let us now discuss how these two aspects are reflected in the benchmarks.

In good-for-CEGIS benchmarks like Grid, after analyzing a quotient MDP for the whole family, the hybrid method mostly profits from better CEs yielding better bounds, thus outperforming CEGIS. Indeed, the CEs are found so fast that the bottleneck is no longer their generation. This also explains why the speedup is not immediately translated to the speedup on the overall synthesis loop. In the good-for-AR benchmark DPM, the hybrid method provides only a minor improvement as it has to perform a large number of AR-iterations before the novel CE-based pruning can be effectively used. This can be considered as the worst-case scenario for the hybrid method. On other good-for-AR benchmarks like Maze and Pole, the good performance on AR allows to quickly obtain tight bounds which can then be exploited by CEGIS. Finally, in hard models like Herman, abstraction-refinement is very expensive, but even the bounds from the first round yield bounds that, as opposed to the trivial bounds, now enable good CEs: CEGIS can keep using these bounds to quickly prune the state space.

**More complicated synthesis problems** Our new approach can push the limits of synthesis benchmarks significantly. We illustrate this by considering a new variant of the Herman model, Herman∗, and a property imposing an upper bound on the expected number of rounds until stabilization. We put this bound just below the optimal (i.e., the minimal) value, yielding a hard non-feasible problem. The synthesis results are summarized in Table 3. As CEGIS performs poorly on Herman, it is excluded here.


**Table 3.** The impact of scaling the family size (of the *Herman*<sup>∗</sup> model) and handling more complex synthesis problems. The left part shows the results for the smaller variant (5k members), the right part for the larger one (3.1M members).

First, we investigate on small Herman<sup>∗</sup> how the methods can handle the synthesis for multi-property specifications. We add one feasible property to the (still non-feasible) specification (row 'two properties'). While including more properties typically slows down the AR computation, the performance of the hybrid method is not affected as the corresponding overhead is mitigated by additional pruning opportunities. Second, we consider optimal synthesis for the property as used in the feasibility synthesis. The hybrid method requires only a minor overhead to find an optimal solution compared to checking feasibility. This overhead is significantly larger for AR.

Next, we consider larger Herman<sup>∗</sup> model having significantly more randomization strategies (3.1M members) that include solutions leading to a considerably faster stabilization. This model is out of reach for existing synthesis approaches: one-by-one enumeration takes more than 27 hours and the AR performs even worse—solving the feasibility and optimality problems requires 47 and 55 hours, respectively. On the other hand, the proposed hybrid method is able to solve these problems within minutes. Finally, we consider a relaxed variant of optimal synthesis (5%-optimality) guaranteeing that the found solution is up to 5% worse than the optimal. Relaxing the optimally criterion speeds up the hybrid synthesis method by about a factor three.

These experiments clearly demonstrate that scaling up the synthesis problem several orders of magnitude renders existing synthesis methods infeasible: they need tens of hours to solve the synthesis problems. Meanwhile, the hybrid method tackles these difficult synthesis problems without significant penalty and is capable of producing a solution within minutes.

### **7 Conclusion**

We present a novel method for the automated synthesis of probabilistic programs. Pairing the counterexample-guided inductive synthesis with the deductive oracle using an MDP abstraction, we develop a synthesis technique enabling faster construction of smaller counterexamples. Evaluating the method on case studies from different domains, we demonstrate that the novel CE construction and the adaptive strategy lead to a significant acceleration of the synthesis process. The proposed method is able to reduce the run-time for challenging problems from days to minutes. In our future work, we plan to investigate counterexamples on the quotient MDPs and improve the abstraction refinement strategy.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Analysis of Markov Jump Processes under Terminal Constraints**

Michael Backenk¨ohler1,-, Luca Bortolussi2,3, Gerrit Großmann1, Verena Wolf1,<sup>3</sup>

<sup>1</sup>Saarbr¨ucken Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1 3, Saarbr¨ucken, Germany michael.backenkoehler@uni-saarland.de <sup>2</sup>Univeristy of Trieste, Trieste, Italy

<sup>3</sup> Saarland University, Saarland Informatics Campus E1 3, Saarbr¨ucken, Germany

**Abstract.** Many probabilistic inference problems such as stochastic filtering or the computation of rare event probabilities require model analysis under initial and terminal constraints. We propose a solution to this *bridging problem* for the widely used class of population-structured Markov jump processes. The method is based on a state-space lumping scheme that aggregates states in a grid structure. The resulting approximate bridging distribution is used to iteratively refine relevant and truncate irrelevant parts of the state-space. This way, the algorithm learns a well-justified finite-state projection yielding guaranteed lower bounds for the system behavior under endpoint constraints. We demonstrate the method's applicability to a wide range of problems such as Bayesian inference and the analysis of rare events.

**Keywords:** Bayesian Inference · Bridging problem · Smoothing · Lumping · Rare Events.

### **1 Introduction**

Discrete-valued continuous-time Markov Jump Processes (MJP) are widely used to model the time evolution of complex discrete phenomena in continuous time. Such problems naturally occur in a wide range of areas such as chemistry [16], systems biology [49,46], epidemiology [36] as well as queuing systems [10] and finance [39]. In many applications, an MJP describes the stochastic interaction of populations of agents. The state variables are counts of individual entities of different populations.

Many tasks, such as the analysis of rare events or the inference of agent counts under partial observations naturally introduce terminal constraints on the system. In these cases, the system's initial state is known, as well as the system's (partial) state at a later time-point. The probabilities corresponding to this so-called bridging problem are often referred to as bridging probabilities [17,19]. For instance, if the exact, full state of the process X<sup>t</sup> has been observed at time 0 and T, the bridging distribution is given by

$$\Pr(X\_t = x \mid X\_0 = x\_0, X\_T = x\_g)$$

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 210–229, 2021. https://doi.org/10.1007/978-3-030-72016-2 12

for all states <sup>x</sup> and times <sup>t</sup> <sup>∈</sup> [0, T]. Often, the condition is more complex, such that in addition to an initial distribution, a terminal distribution is present. Such problems typically arise in a Bayesian setting, where the a priori behavior of a system is filtered such that the posterior behavior is compatible with noisy, partial observations [11,25]. For example, time-series data of protein levels is available while the mRNA concentration is not [1,25]. In such a scenario our method can be used to identify a good truncation to analyze the probabilities of mRNA levels.

Bridging probabilities also appear in the context of rare events. Here, the rare event is the terminal constraint because we are only interested in paths containing the event. Typically researchers have to resort to Monte-carlo simulations in combination with variance reduction techniques in such cases [14,26].

Efficient numerical approaches that are not based on sampling or ad-hoc approximations have rarely been developed.

Here, we combine state-of-the-art truncation strategies based on a forward analysis [28,4] with a refinement approach that starts from an abstract MJP with lumped states. We base this lumping on a grid-like partitioning of the state-space. Throughout a lumped state, we assume a uniform distribution that gives an efficient and convenient abstraction of the original MJP. Note that the lumping does not follow the classical paradigm of Markov chain lumpability [12] or its variants [15]. Instead of an approximate block structure of the transition-matrix used in that context, we base our partitioning on a segmentation of the molecule counts. Moreover, during the iterative refinement of our abstraction, we identify those regions of the state-space that contribute most to the bridging distribution. In particular, we refine those lumped states that have a bridging probability above a certain threshold δ and truncate all other macro-states. This way, the algorithm learns a truncation capturing most of the bridging probabilities. This truncation provides guaranteed lower bounds because it is at the granularity of the original model.

In the rest of the paper, after presenting related work (Section 2) and background (Section 3), we discuss the method (Section 4) and several applications, including the computation of rare event probabilities as well as Bayesian smoothing and filtering (Section 5).

### **2 Related Work**

The problem of endpoint constrained analysis occurs in the context of Bayesian estimation [41]. For population-structured MJPs, this problem has been addressed by Huang et al. [25] using moment closure approximations and by Wildner and K¨oppl [48] further employing variational inference. Golightly and Sherlock modified stochastic simulation algorithms to approximatively augment generated trajectories [17]. Since a statistically exact augmentation is only possible for few simple cases, diffusion approximations [18] and moment approximations [35] have been employed. Such approximations, however, do not give any guarantees on the approximation error and may suffer from numerical instabilities [43].

The bridging problem also arises during the estimation of first passage times and rare event analysis. Approaches for first-passage times are often of heuristic nature [42,22,8]. Rigorous approaches yielding guaranteed bounds are currently limited by the performance of state-of-the-art optimization software [6]. In biological applications, rare events of interest are typically related to the reachability of certain thresholds on molecule counts or mode switching [45]. Most methods for the estimation of rare event probabilities rely on importance sampling [26,14]. For other queries, alternative variance reduction techniques such as control variates are available [5]. Apart from sampling-based approaches, dynamic finite-state projections have been employed by Mikeev et al. [34], but are lacking automated truncation schemes.

The analysis of countably infinite state-spaces is often handled by a predefined truncation [27]. Sophisticated state-space truncations for the (unconditioned) forward analysis have been developed to give lower bounds and rely on a trade-off between computational load and tightness of the bound [37,28,4,24,31].

Reachability analysis, which is relevant in the context of probabilistic verification [8,38], is a bridging problem where the endpoint constraint is the visit of a set of goal states. Backward probabilities are commonly used to compute reachability likelihoods [2,50]. Approximate techniques for reachability, based on moment closure and stochastic approximation, have also been developed in [8,9], but lack error guarantees. There is also a conceptual similarity between computing bridging probabilities and the forward-backward algorithm for computing state-wise posterior marginals in hidden Markov models (HMMs) [40]. Like MJPs, HMMs are a generative model that can be conditioned on observations. We only consider two observations (initial and terminal state) that are not necessarily noisy but the forward and backward probabilities admit the same meaning.

### **3 Preliminaries**

#### **3.1 Markov Jump Processes with Population Structure**

A population-structured Markov jump process (MJP) describes the stochastic interactions among agents of distinct types in a well-stirred reactor. The assumption of all agents being equally distributed in space, allows to only keep track of the overall copy number of agents for each type. Therefore the state-space is S ⊆ <sup>N</sup><sup>n</sup><sup>S</sup> where <sup>n</sup><sup>S</sup> denotes the number of agent types or populations. Interactions between agents are expressed as reactions. These reactions have associated gains and losses of agents, given by non-negative integer vectors v<sup>−</sup> <sup>j</sup> and <sup>v</sup><sup>+</sup> <sup>j</sup> for reaction j, respectively. The overall effect is given by v<sup>j</sup> = v<sup>+</sup> <sup>j</sup> <sup>−</sup> <sup>v</sup><sup>−</sup> <sup>j</sup> . A reaction between agents of types S1,...,S<sup>n</sup><sup>S</sup> is specified in the following form:

$$\sum\_{\ell=1}^{n\_S} v\_{j\ell}^- S\_\ell \xrightarrow{\alpha\_j(x)} \sum\_{\ell=1}^{n\_S} v\_{j\ell}^+ S\_\ell \,. \tag{1}$$

The propensity function α<sup>j</sup> gives the rate of the exponentially distributed firing time of the reaction as a function of the current system state <sup>x</sup> ∈ S. In population models, mass-action propensities are most common. In this case the firing rate is given by the product of the number of reactant combinations in x and a rate constant c<sup>j</sup> , i.e.

$$\alpha\_j(x) \coloneqq c\_j \prod\_{\ell=1}^{ns} \binom{x\_\ell}{v\_{j\ell}^-} \,. \tag{2}$$

In this case, we give the rate constant in (1) instead of the function α<sup>j</sup> . For a given set of <sup>n</sup><sup>R</sup> reactions, we define a stochastic process {Xt}<sup>t</sup>≥<sup>0</sup> describing the evolution of the population sizes over time t. Due to the assumption of exponentially distributed firing times, X is a continuous-time Markov chain (CTMC) on <sup>S</sup> with infinitesimal generator matrix <sup>Q</sup>, where the entries of <sup>Q</sup> are

$$Q\_{x,y} = \begin{cases} \sum\_{j:x+v\_j=y} \alpha\_j(x), & \text{if } x \neq y, \\ -\sum\_{j=1}^{n\_R} \alpha\_j(x), & \text{otherwise.} \end{cases} \tag{3}$$

The probability distribution over time can be analyzed as an initial value problem. Given an initial state x0, the distribution<sup>1</sup>

$$\pi(x\_i, t) = \Pr(X\_t = x\_i \mid X\_0 = x\_0), \quad t \ge 0 \tag{4}$$

evolves according to the Kolmogorov forward equation

$$\frac{d}{dt}\pi(t) = \pi(t)Q\,, \tag{5}$$

where <sup>π</sup>(t) is an arbitrary vectorization (π(x1, t), π(x2, t),...,π(x|S|, t)) of the states.

Let <sup>x</sup><sup>g</sup> ∈ S be a fixed goal state. Given the terminal constraint Pr(X<sup>T</sup> <sup>=</sup> <sup>x</sup>g) for some <sup>T</sup> <sup>≥</sup> 0, we are interested in the so-called backward probabilities

$$\beta(x\_i, t) = \Pr(X\_T = x\_g \mid X\_t = x\_i), \quad t \le T. \tag{6}$$

Note that <sup>β</sup>(·, t) is a function of the conditional event and thus is no probability distribution over the state-space. Instead <sup>β</sup>(·, t) gives the reaching probabilities for all states over the time span of [t, T]. To compute these probabilities, we can employ the Kolmogorov backward equation

$$\frac{d}{dt}\beta(t) = Q\beta(t)^{\top},\tag{7}$$

where we use the same vectorization to construct β(t) as we used for π(t). The above equation is integrated backwards in time and yields the reachability probability for each state x<sup>i</sup> and time t<T of ending up in x<sup>g</sup> at time T.

<sup>1</sup> In the sequel, x<sup>i</sup> denotes a state with index i instead of its i-th component.

The state-space of many MJPs with population structure, even simple ones, is countably infinite. In this case, we have to truncate the state-space to a reasonable finite subset. The choice of this truncation heavily depends on the goal of the analysis. If one is interested in the most "common" behavior, for example, a dynamic mass-based truncation scheme is most appropriate [32]. Such a scheme truncates states with small probability during the numerical integration. However, common mass-based truncation schemes are not as useful for the bridging problem. This is because trajectories that meet the specific terminal constraints can be far off the main bulk of the probability mass. We solve this problem by a state-space lumping in connection with an iterative refinement scheme.

Consider as an example a birth-death process. This model can be used to model a wide variety of phenomena and often constitutes a sub-module of larger models. For example, it can be interpreted as an M/M/1 queue with service rates being linearly dependent on the queue length. Note, that even for this simple model, the state-space is countably infinite.

**Model 1 (Birth-Death Process).** The model consists of exponentially distributed arrivals and service times proportional to queue length. It can be expressed using two mass-action reactions:

$$
\mathcal{Q} \xrightarrow{10} X \qquad \text{and} \qquad X \xrightarrow{\cdot 1} \mathcal{Q} \dots
$$

The initial condition X<sup>0</sup> = 0 holds with probability one.

#### **3.2 Bridging Distribution**

The process' probability distribution given both initial and terminal constraints is formally described by the conditional probabilities

$$\gamma(x\_i, t) = \Pr(X\_t = x\_i \mid X\_0 = x\_0, X\_T = x\_g), \quad 0 \le t \le T \tag{8}$$

for fixed initial state x<sup>0</sup> and terminal state xg. We call these probabilities the bridging probabilities. It is straight-forward to see that γ admits the factorization

$$
\gamma(x\_i, t) = \pi(x\_i, t)\beta(x\_i, t) / \pi(x\_g, T) \tag{9}
$$

due to the Markov property. The normalization factor, given by the reachability probability <sup>π</sup>(xg, T) = <sup>β</sup>(x0, 0), ensures that <sup>γ</sup>(·, t) is a distribution for all time points <sup>t</sup> <sup>∈</sup> [0, T]. We call each <sup>γ</sup>(·, t) a bridging distribution. From the Kolmogorov equations (5) and (7) we can obtain both the forward probabilities <sup>π</sup>(·, t) and the backward probabilities <sup>β</sup>(·, t) for t<T.

We can easily extend this procedure to deal with hitting times constrained by a finite time-horizon by making the goal state x<sup>g</sup> absorbing.

In Figure 1 we plot the forward, backward, and bridging probabilities for Model 1. The probabilities are computed on a [0, 100] state-space truncation. The approximate forward solution ˆπ shows how the probability mass drifts upwards towards the stationary distribution Poisson(100). The backward probabilities

**Fig. 1.** Forward, backward, and bridging probabilities for Model 1 with initial constraint X<sup>0</sup> = 0 and terminal constraint X<sup>10</sup> = 40 on a truncated state-space. Probabilities over 0.1 in ˆπ and βˆ are given full intensity for visual clarity. The lightly shaded area (≥ 60) indicates a region being more relevant for the forward than for the bridging probabilities.

are highest for states below the goal state x<sup>g</sup> = 40. This is expected because upwards drift makes reaching x<sup>g</sup> more probable for "lower" states. Finally, the approximate bridging distribution ˆγ can be recognized to be proportional to the product of forward ˆπ and backward probabilities βˆ.

### **4 Bridge Truncation via Lumping Approximations**

We first discuss the truncation of countably infinite state-spaces to analyze backward and forward probabilities (Section 4.1). To identify effective truncations we employ a lumping scheme. In Section 4.2, we explain the construction of macrostates and assumptions made, as well as the efficient calculation of transition rates between them. Finally, in Section 4.3 we present an iterative refinement algorithm yielding a suitable truncation for the bridging problem.

#### **4.1 Finite State Projection**

Even in simple models such as a birth-death Process (Model 1), the reachable state-space is countably infinite. Direct analyzes of backward (6) and forward equations (4) are often infeasible. Instead, the integration of these differential equations requires working with a finite subset of the infinite state-space [37]. If states are truncated, their incoming transitions from states that are not truncated can be re-directed to a sink state. The accumulated probability in this sink state is then used as an error estimate for the forward integration scheme. Consequently, many truncation schemes, such as dynamic truncations [4], aim to minimize the amount of "lost mass" of the forward probability. We use the same truncation method but base the truncation on bridging probabilities rather than the forward probabilities.

#### **4.2 State-Space Lumping**

When dealing with bridging problems, the most likely trajectories from the initial to the terminal state are typically not known a priori. Especially if the event in question is rare, obtaining a state-space truncation adapted to its constraints is difficult. We devise a lumping scheme that groups nearby states, i.e. molecule counts, into larger macro-states. A macro-state is a collection of states treated as one state in a lumped model, which can be seen as an abstraction of the original model. These macro-states form a partitioning of the state-space. In this lumped model, we assume a uniform distribution over the constituent microstates inside each macro-state. Thus, given that the system is in a particular macro-state, all of its micro-states are equally likely. This partitioning allows us to analyze significant regions of the state-space efficiently albeit under a rough approximation of the dynamics. Iterative refinement of the state-space after each analysis moves the dynamics closer to the original model. In the final step of the iteration, the considered system states are at the granularity of the original model such that no approximation error is introduced by assumptions of the lumping scheme. Computational efficiency is retained by truncating in each iteration step those states that contribute little probability mass to the (approximated) bridging distributions.

We choose a lumping scheme based on a grid of hypercube macro-states whose endpoints belong to a predefined grid. This topology makes the computation of transition rates between macro-states particularly convenient. Mass-action reaction rates, for example, can be given in a closed-form due to the Faulhaber formulae. More complicated rate functions such as Hill functions can often be handled as well by taking appropriate integrals.

Our choice is a scheme that uses nS-dimensional hypercubes. A macro-state x¯i((i), u(i)) (denoted by ¯x<sup>i</sup> for notational ease) can therefore be described by two vectors (i) and u(i). The vector (i) gives the corner closest to the origin, while u(i) gives the corner farthest from the origin. Formally,

$$\bar{x}\_i = \bar{x}\_i(\ell^{(i)}, u^{(i)}) = \{x \in \mathbb{N}^{n\_S} \mid \ell^{(i)} \le x \le u^{(i)}\},\tag{10}$$

where '≤' stands for the element-wise comparison. This choice of topology makes the computation of transition rates between macro-states particularly convenient: Suppose we are interested in the set of micro-states in macro-state ¯x<sup>i</sup> that can transition to macro-state ¯x<sup>k</sup> via reaction j. It is easy to see that this set is itself an interval-defined macro-state ¯x i j −→<sup>k</sup> . To compute this macro-state we can simply shift ¯x<sup>i</sup> by v<sup>j</sup> , take the intersection with ¯x<sup>k</sup> and project this set back. Formally,

$$\bar{x}\_{i \stackrel{j}{\rightarrow} k} = \left( (\bar{x}\_i + v\_j) \cap \bar{x}\_k \right) - v\_j \; , \tag{11}$$

where the additions are applied element-wise to all states making up the macrostates. For the correct handling of the truncation it is useful to define a general exit state

$$
\bar{x}\_{i\stackrel{j}{\rightarrow}} = ( (\bar{x}\_i + v\_j) \mid \bar{x}\_i) - v\_j. \tag{12}
$$

This state captures all micro-states inside ¯x<sup>i</sup> that can leave the state via reaction j. Note that all operations preserve the structure of a macro-state as defined in (10). Since a macro-state is based on intervals the computation of the transition rate is often straight-forward. Under the assumption of polynomial rates, as

**Fig. 2.** A lumping approximation of Model 1 on the state-space truncation to [0, 200] on t ∈ [0, 50]. On the left-hand side solutions of a regular truncation approximation and a lumped truncation (macro-state size is 5) are given. On the right-hand side the respective terminal distributions Pr(X<sup>50</sup> = xi) are contrasted.

it is the case for mass-action systems, we can compute the sum of rates over this transition set efficiently using Faulhaber's formula. We define the lumped transition function

$$\bar{\alpha}\_j(\bar{x}) = \sum\_{x \in \bar{x}} \alpha\_j(x) \tag{13}$$

for macro-state ¯x and reaction j. As an example consider the following massaction reaction 2X <sup>c</sup> −→ <sup>∅</sup> . For macro-state ¯<sup>x</sup> <sup>=</sup> {0,...,n} we can compute the corresponding lumped transition rate

$$\bar{\alpha}(\bar{x}) = \frac{c}{2} \sum\_{i=1}^{n} i(i-1) = \frac{c}{2} \sum\_{i=1}^{n} (i^2 - i) = \frac{c}{2} \left( \frac{2n^3 + 3n^2 + n}{6} - \frac{n^2 + n}{2} \right)^2$$

eliminating the explicit summation in the lumped propensity function.

For polynomial propensity functions α such formulae are easily obtained automatically. For non-polynomial propensity functions, we can use the continuous integral as an approximation. This is demonstrated on a case study in Section 5.2.

Using the transition set computation (11) and the lumped propensity function (13) we can populate the Q-matrix of the finite lumping approximation:

$$\boldsymbol{Q}\_{\bar{x}\_{i},\bar{x}\_{k}} = \begin{cases} \sum\_{j=1}^{n\_{R}} \bar{\alpha}\_{j} \left( \bar{x}\_{i \underset{i \rightarrow k}{\rightarrow}} \right) / \text{vol} \left( \bar{x}\_{i} \right), & \text{if } \bar{x}\_{i} \neq \bar{x}\_{k} \\\ -\sum\_{j=1}^{n\_{R}} \bar{\alpha}\_{j} \left( \bar{x}\_{i \underset{i \rightarrow}{\rightarrow}} \right) / \text{vol} \left( \bar{x}\_{i} \right), & \text{otherwise} \end{cases} \tag{14}$$

In addition to the lumped rate function over the transition state ¯x i j −→<sup>k</sup> , we need to divide by the total volume of the lumped state ¯xi. This is due to the assumption of a uniform distribution inside the macro-states. Using this Q-matrix, we can compute the forward and backward solution using the respective Kolmogorov equations (5) and (7).

Interestingly, the lumped distribution tends to be less concentrated. This is due to the assumption of a uniform distribution inside macro-states. This effect is illustrated by the example of a birth-death process in Figure 2. Due to this effect, an iterative refinement typically keeps an over-approximation in terms of state-space area. This is a desirable feature since relevant regions are less likely to be pruned due to lumping approximations.

#### **4.3 Iterative Refinement Algorithm**

The iterative refinement algorithm (Alg. 1) starts with a set of large macro-states that are iteratively refined, based on approximate solutions to the bridging problem. We start by constructing square macro-states of size 2<sup>m</sup> in each dimension for some <sup>m</sup> <sup>∈</sup> <sup>N</sup> such that they form a large-scale grid <sup>S</sup>(0). Hence, each initial macro-state has a volume of (2<sup>m</sup>) <sup>n</sup><sup>S</sup> . This choice of grid size is convenient because we can halve states in each dimension. Moreover, this choice ensures that all states have equal volume and we end up with states of volume 2<sup>0</sup> = 1 which is equivalent to a truncation of the original non-lumped state-space.

An iteration of the state-space refinement starts by computing both the forward and backward probabilities (lines 2 and 3) via integration of (5) and (7), respectively, using the lumped Qˆ-matrix. Based on the resulting approximate forward and backward probabilities, we compute an approximation of the bridging distributions (line 4). This is done for each time-point in an equispaced grid on [0, T]. The time grid granularity is a hyper-parameter of the algorithm. If the grid is too fine, the memory overhead of storing backward βˆ(i) and forward solutions ˆπ(i) increases.<sup>2</sup> If, on the other hand, the granularity is too low, too much of the state-space might be truncated. Based on a threshold parameter δ > 0 states are either removed or split (line 7), depending on the mass assigned to them by the approximate bridging probabilities ˆγ(i) <sup>t</sup> . A state can be split by the split-function which halves the state in each dimension. Otherwise, it is removed. Thus, each macro-state is either split into 2<sup>n</sup><sup>S</sup> new states or removed entirely. The result forms the next lumped state-space <sup>S</sup>(i+1). The <sup>Q</sup>-matrix is adjusted (line 10) such that transition rates for <sup>S</sup>(i+1) are calculated according to (14). Entries of truncated states are removed from the transition matrix. Transitions leading to them are re-directed to a sink state (see Section 4.1). After m iterations (we started with states of side lengths 2<sup>m</sup>) we have a standard finite state projection scheme on the original model tailored to computing an approximation of the bridging distribution.

In Figure 3 we give a demonstration of how Algorithm 1 works to refine the state-space iteratively. Starting with an initial lumped state-space <sup>S</sup>(0) covering a large area of the state-space, repeated evaluations of the bridging distributions are performed. After five iterations the remaining truncation includes all states that significantly contribute to the bridging probabilities over the times [0, T].

It is important to realize that determining the most relevant states is the main challenge. The above algorithm solves this problem by considering only

<sup>2</sup> We denote the approximations with a hat (e.g. ˆπ) rather than a bar (e.g. ¯π) to indicate that not only the lumping approximation but also a truncation is applied and similarly for the Q-matrix.

#### **Algorithm 1:** Iterative refinement for the bridging problem

**input :** Initial partitioning <sup>S</sup>(0), truncation threshold <sup>δ</sup> **output:** approximate bridging distribution ˆγ **1 for** i = 1,...,m **do <sup>2</sup>** πˆ(i) <sup>t</sup> <sup>←</sup> solve approximate forward equation on <sup>S</sup>(i) ; **<sup>3</sup>** βˆ(i) <sup>t</sup> <sup>←</sup> solve approximate backward equation on <sup>S</sup>(i) ; **<sup>4</sup>** γˆ(i) <sup>t</sup> <sup>←</sup> <sup>β</sup>ˆ(i) πˆ(i) /πˆ(xg, T); /\* approximate bridging distribution \*/ **<sup>5</sup>** <sup>S</sup>(i+1) ← ∅; **<sup>6</sup> foreach** <sup>x</sup>¯ ∈ S(i) **do <sup>7</sup> if** <sup>∃</sup>t.γˆ(i) <sup>t</sup> (¯x) ≥ δ*;* /\* refine based on bridging probabilities \*/ **8 then <sup>9</sup>** <sup>S</sup>(i+1) ← S(i+1) <sup>∪</sup> split(¯x); **<sup>10</sup>** update Qˆ-matrix; **<sup>11</sup> return** γˆ(i) ;

**Fig. 3.** The state-space refinement algorithm on two parallel unit-rate arrival processes. The bridging problem from (0, 0) to (64, 64) and T = 10 and truncation threshold δ = 5e-3. States with a bridging probability below δ are light grey. The macro-state containing the goal state is marked in black. The initial macro-states are of size 16×16.

those parts of the state-space that contribute most to the bridging probabilities. The truncation is tailored to this condition and might ignore regions that are likely in the unconditioned case. For instance, in Fig. 1 the bridging probabilities mostly remain below a population threshold of #X = 60 (as indicated by the lighter/darker coloring), while the forward probabilities mostly exceed this bound. Hence, in this example a significant portion of the forward probabilities πˆ(i) <sup>t</sup> is captured by the sink state. However, the condition in line 7 in Algorithm 1 ensures that states contributing significantly to ˆγ(i) <sup>t</sup> will be kept and refined in the next iteration.

### **5 Results**

We present four examples in this section to evaluate our proposed method. A prototype was implemented in Python 3.8. For numerical integration we


**Table 1.** Estimated reachability probabilities based on varying truncation thresholds δ: The true probability is 1.8625e-29. We also report the size of the final truncation and the accumulated size of all truncations during refinement iterations (overall states).

used the Scipy implementation [47] of the implicit method based on backwarddifferentiation formulas [13]. The analysis as a Jupyter notebook is made available online3.

#### **5.1 Bounding Rare Event Probabilities**

We consider a simple model of two parallel Poisson processes describing the production of two types of agents. The corresponding probability distribution has Poisson product form at all time points <sup>t</sup> <sup>≥</sup> 0 and hence we can compare the accuracy of our numerical results with the exact analytic solution. We use the proposed approach to compute lower bounds for rare event probabilities. <sup>4</sup>

**Model 2 (Parallel Poisson Processes).** The model consists of two parallel independent Poisson processes with unit rates.

$$
\mathcal{Q} \xrightarrow{1} A \qquad \text{and} \qquad \mathcal{Q} \xrightarrow{1} B \dots
$$

The initial condition X<sup>0</sup> = (0, 0) holds with probability one. After t time units each species abundance is Poisson distributed with rate λ = t.

We consider the final constraint of reaching a state where both processes exceed a threshold of 64 at time 20. Without prior knowledge, a reasonable truncation would have been 160×160. But our analysis shows that just 20% of the states are necessary to capture over 99.6% of the probability mass reaching the target event (cf. Table 1). Decreasing the threshold δ leads to a larger set of states retained after truncation as more of the bridging distribution is included (cf. Figure 4). We observe an increase in truncation size that is approximately logarithmic in δ, which, in this example, indicates robustness of the method with respect to the choice of δ.

<sup>3</sup> https://www.github.com/mbackenkoehler/mjp bridging

<sup>4</sup> These bounds are rigorous up to the approximation error of the numerical integration scheme. However, the forward solution could be replaced by an adaptive uniformization approach [3] for a more rigorous integration error control.

**Fig. 4.** State-space truncation for varying values of the threshold parameter δ: Two parallel Poisson processes under terminal constraints X(A) <sup>20</sup> <sup>≥</sup> 64 and <sup>X</sup>(B) <sup>20</sup> ≥ 64. The initial macro-states are 16 × 16 such that the final states are regular micro states.

Comparison to other methods The truncation approach that we apply is similar to the one used by Mikeev et al. [34] for rare event estimation. However, they used a given linearly biased MJP model to obtain a truncation. A general strategy to compute an appropriate biasing was not proposed. It is possible to adapt our truncation approach to the dynamic scheme in Ref. [34] where states are removed in an on-the-fly fashion during numerical integration.

A finite state-space truncation covering the same area as the initial lumping approximation would contain 25,600 states.<sup>5</sup> The standard approach would be to build up the entire state-space for such a model [27]. Even using a conservative truncation threshold δ = 1e-5, our method yields an accurate estimate using only about a fifth (5450) of this accumulated over all intermediate lumped approximations.

#### **5.2 Mode Switching**

Mode switching occurs in models exhibiting multi-modal behavior [44] when a trajectory traverses a potential barrier from one mode to another. Often, mode switching is a rare event and occurs in the context of gene regulatory networks where a mode is characterized by the set of genes being currently active [30]. Similar dynamics also commonly occur in queuing models where a system may for example switch its operating behavior stochastically if traffic increases above or decreases below certain thresholds. Using the presented method, we can get both a qualitative and quantitative understanding of switching behavior without resorting to Monte-Carlo methods such as importance sampling.

**Exclusive Switch** The exclusive switch [7] has three different modes of operation, depending on the DNA state, i.e. on whether a protein of type one or two is bound to the DNA.

<sup>5</sup> Here, the goal is not treated as a single state. Otherwise, it consists of 24,130 states.

**Model 3 (Exclusive Switch).** The exclusive switch model consists of a promoter region that can express both proteins P<sup>1</sup> and P2. Both can bind to the region, suppressing the expression of the other protein. For certain parameterizations, this leads to a bi-modal or even tri-modal behavior.

$$D \xrightarrow{\rho} D + P\_1 \qquad D \xrightarrow{\rho} D + P\_2 \qquad P\_1 \xrightarrow{\lambda} \otimes \qquad P\_2 \xrightarrow{\lambda} \otimes$$

$$D + P\_1 \xrightarrow{\beta} D.P\_1 \qquad D.P1 \xrightarrow{\gamma} D + P\_1 \qquad D.P\_1 \xrightarrow{\alpha} D.P\_1 + P\_1$$

$$D + P\_2 \xrightarrow{\beta} D.P\_2 \qquad D.P2 \xrightarrow{\gamma} D + P\_2 \qquad D.P\_2 \xrightarrow{\alpha} D.P\_2 + P\_2$$

The parameter values are ρ = 1e-1, λ = 1e-3, β = 1e-2, γ = 8e-3, and α = 1e-1.

Since we know a priori of the three distinct operating modes, we adjust the method slightly: The state-space for the DNA states is not lumped. Instead we "stack" lumped approximations of the P1-P<sup>2</sup> phase space upon each other. Special treatment of DNA states is common for such models [28].

To analyze the switching, we choose the transition from (variable order: P1, P2, D, D.P1, D.P2) x<sup>1</sup> = (32, 0, 0, 0, 1) to x<sup>2</sup> = (0, 32, 0, 1, 0) over the time interval <sup>t</sup> <sup>∈</sup> [0, 10]. The initial lumping scheme covers up to 80 molecules of <sup>P</sup><sup>1</sup> and <sup>P</sup><sup>2</sup> for each mode. Macro-states have size 8×8 and the truncation threshold is δ = 1e-4.

In the analysis of biological switches, not only the switching probability but also the switching dynamics is a central part of understanding the underlying biological mechanisms. In Figure 5 (left), we therefore plot the time-varying probabilities of the gene state conditioned on the mode. We observe a rapid unbinding of P2, followed by a slow increase of the binding probability for P1. These dynamics are already qualitatively captured by the first lumped approximation (dashed lines).

**Toggle Switch** Next, we apply our method to a toggle switch model exhibiting non-polynomial rate functions. This well-known model considers two proteins A and B inhibiting the production of the respective other protein [29].

**Model 4.** Toggle Switch (Hill functions) We have population types A and B with the following reactions and reaction rates.

$$\mathcal{B} \xrightarrow{\alpha\_1(\cdot)} A \quad , \quad where \quad \alpha\_1(x) = \frac{\rho}{1 + x\_B}, \qquad A \xrightarrow{\lambda} \mathcal{B}$$

$$\mathcal{B} \xrightarrow{\alpha\_1(\cdot)} B \quad , \quad where \quad \alpha\_1(x) = \frac{\rho}{1 + x\_A}, \qquad B \xrightarrow{\lambda} \mathcal{B}$$

The parameterization is ρ = 10, λ = 0.1.

Due to the non-polynomial rate functions α<sup>1</sup> and α2, the transition rates between macro-states are approximated by using the continuous integral

$$\bar{\alpha}\_1(\bar{x}) \approx \int\_{a-0.5}^{b+0.5} \frac{\rho}{1+x} \, dx = \rho \left( \log \left( b + 1.5 \right) - \log \left( a + 0.5 \right) \right)$$

**Fig. 5.** (left) Mode probabilities of the exclusive switch bridging problem over time for the first lumped approximation (dashed lines) and the final approximation (solid lines) with constraints X<sup>0</sup> = (32, 0, 0, 1, 0) and X<sup>10</sup> = (0, 32, 0, 0, 1). (right) The expected occupation time (excluding initial and terminal states) for the switching problem of the toggle switch using Hill-type functions. The bridging problem is from initial (0, 120) to a first passage of (120, 0) in t ∈ [0, 10].

for a macro-state ¯<sup>x</sup> <sup>=</sup> {a, . . . , b}.

We analyze the switching scenario from (0, 120) to the first visit of state (120, 0) up to time T = 10. The initial lumping scheme covers up to 352 molecules of <sup>A</sup> and <sup>B</sup> and macro-states have size 32 <sup>×</sup> 32. The truncation threshold is δ = 1e-4. The resulting truncation is shown in Figure 5 (right). It also illustrates the kind of insights that can be obtained from the bridging distributions. For an overview of the switching dynamics, we look at the expected occupation time under the terminal constraint of having entered state (120, 0). Letting the corresponding hitting time be <sup>τ</sup> = inf{<sup>t</sup> <sup>≥</sup> <sup>0</sup> <sup>|</sup> <sup>X</sup><sup>t</sup> = (120, 0)}, the expected occupation time for some state x is E ! <sup>τ</sup> <sup>0</sup> <sup>1</sup>=x(Xt) dt <sup>|</sup> <sup>τ</sup> <sup>≤</sup> <sup>10</sup> . We observe that in this example the switching behavior seems to be asymmetrical. The main mass seems to pass through an area where initially a small number of A molecules is produced followed by a total decay of B molecules.

#### **5.3 Recursive Bayesian Estimation**

We now turn to the method's application in recursive Bayesian estimation. This is the problem of estimating the system's past, present, and future behavior under given observations. Thus, the MJP becomes a hidden Markov model (HMM). The observations in such models are usually noisy, meaning that we cannot infer the system state with certainty.

This estimation problem entails more general distributional constraints on terminal <sup>β</sup>(·, T) and initial <sup>π</sup>(·, 0) distributions than the point mass distributions considered up until now. We can easily extend the forward and backward probabilities to more general initial distributions and terminal distributions β(T). For the forward probabilities we get

$$\pi(x\_i, t) = \sum\_j \Pr(X\_t = x\_i \mid X\_0 = x\_j) \pi(x\_j, 0), \tag{15}$$

and similarly the backward probabilities are given by

$$\beta(x\_i, t) = \sum\_j \Pr(X\_T = x\_j \mid X\_t = x\_i) \beta\_T(x\_j) \,. \tag{16}$$

We apply our method to an SEIR (susceptible-exposed-infected-removed) model. This is widely used to describe the spreading of an epidemic such as the current COVID-19 outbreak [23,20]. Temporal snapshots of the epidemic spread are mostly only available for a subset of the population and suffer from inaccuracies of diagnostic tests. Bayesian estimation can then be used to infer the spreading dynamics given uncertain temporal snapshots.

**Model 5 (Epidemics Model).** A population of susceptible individuals can contract a disease from infected agents. In this case, they are exposed, meaning they will become infected but cannot yet infect others. After being infected, individuals change to the removed state. The mass-action reactions are as follows.

$$S + I \xrightarrow{\lambda} E + I \qquad E \xrightarrow{\mu} I \qquad \quad I \xrightarrow{\rho} R$$

The parameter values are λ = 0.5, μ = 3, ρ = 3. Due to the stoichiometric invariant X(S) <sup>t</sup> <sup>+</sup> <sup>X</sup>(E) <sup>t</sup> <sup>+</sup> <sup>X</sup>(I) <sup>t</sup> <sup>+</sup> <sup>X</sup>(R) <sup>t</sup> = const., we can eliminate R from the system.

We consider the following scenario: We know that initially (t = 0) one individual is infected and the rest is susceptible. At time t = 0.3 all individuals are tested for the disease. The test, however, only identifies infected individuals with probability 0.99. Moreover, the probability of a false positive is 0.05. We like to identify the distribution given both the initial state and the measurement at time t = 0.3. In particular, we want to infer the distribution over the latent counts of S and E by recursive Bayesian estimation.

The posterior for n<sup>I</sup> infected individuals at time t, given measurement Y<sup>t</sup> = nˆ<sup>I</sup> can be computed using Bayes' rule

$$\Pr(X\_t^{(I)} = n\_I \mid Y\_t = \hat{n}\_I) \propto \Pr(Y\_t = \hat{n}\_I \mid X\_t^{(I)} = n\_I) \Pr(X\_t^{(I)} = n\_I) \,. \tag{17}$$

This problem is an extension of the bridging problem discussed up until now. The difference is that the terminal posterior is estimated it using the result of the lumped forward equation and the measurement distribution using (17). Based on this estimated terminal posterior, we compute the bridging probabilities and refine the truncation tailored to the location of the posterior distribution. In Figure 6 (left), we illustrate the bridging distribution between the terminal posterior and initial distribution. In the context of filtering problems this is commonly referred to as smoothing. Using the learned truncation, we can obtain the posterior distribution for the number of infected individuals at t = 0.3 (Figure 6 (middle)). Moreover, can we infer a distribution over the unknown number of susceptible and exposed individuals (Figure 6 (right)).

**Fig. 6.** (left) A comparison of the prior dynamics and the posterior smoothing (bridging) dynamics. (middle) The prior, likelihood, and posterior of the number of infected individuals n<sup>I</sup> at time t = 0.3 given the measurement ˆn<sup>I</sup> = 30. (right) The prior and posterior distribution over the latent types E and S.

### **6 Conclusion**

The analysis of Markov Jump processes with constraints on the initial and terminal behavior is an important part of many probabilistic inference tasks such as parameter estimation using Bayesian or maximum likelihood estimation, inference of latent system behavior, the estimation of rare event probabilities, and reachability analysis for the verification of temporal properties. If endpoint constraints correspond to atypical system behaviors, standard analysis methods fail as they have no strategy to identify those parts of the state-space relevant for meeting the terminal constraint.

Here, we proposed a method that is not based on stochastic sampling and statistical estimation but provides a direct numerical approach. It starts with an abstract lumped model, which is iteratively refined such that only those parts of the model are considered that contribute to the probabilities of interest. In the final step of the iteration, we operate at the granularity of the original model and compute lower bounds for these bridging probabilities that are rigorous up to the error of the numerical integration scheme.

Our method exploits the population structure of the model, which is present in many important application fields of MJPs. Based on experience with other work based on truncation, the approach can be expected to scale up to at least a few million states [33]. Compared to previous work, our method neither relies on approximations of unknown accuracy nor additional information such as a suitable change of measure in the case of importance sampling. It only requires a truncation threshold and an initial choice for the macro-state sizes.

In future work, we plan to extend our method to hybrid approaches, in which a moment representation is employed for large populations while discrete counts are maintained for small populations. Moreover, we will apply our method to model checking where constraints are described by some temporal logic [21].

**Acknowledgements** This project was supported by the DFG project MULTI-MODE and Italian PRIN project SEDUCE.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended

# Multi-objective Optimization of Long-run Average and Total Rewards

Tim Quatmann1(-) and Joost-Pieter Katoen<sup>1</sup>

RWTH Aachen University, Aachen, Germany tim.quatmann@cs.rwth-aachen.de

Abstract This paper presents an efficient procedure for multi-objective model checking of long-run average reward (aka: mean pay-off) and total reward objectives as well as their combination. We consider this for Markov automata, a compositional model that captures both traditional Markov decision processes (MDPs) as well as a continuous-time variant thereof. The crux of our procedure is a generalization of Forejt *et al.*'s approach for total rewards on MDPs to arbitrary combinations of longrun and total reward objectives on Markov automata. Experiments with a prototypical implementation on top of the Storm model checker show encouraging results for both model types and indicate a substantial improved performance over existing multi-objective long-run MDP model checking based on linear programming.

### 1 Introduction

*MDP model checking* In various applications, multiple decision criteria and uncertainty frequently co-occur. Stochastic decision processes for which the objective is to achieve multiple—possibly partly conflicting—objectives occur in various fields. These include operations research, economics, planning in AI, and game theory, to mention a few. This has stimulated model checking of Markov decision processes (MDPs) [46], a prominent model in decision making under uncertainty, against multiple objectives. This development enlarges the rich plethora of automated MDP verification algorithms against single objectives [7].

*Multi-objective MDP* Various types of objectives known from conventional single-objective—model checking have been lifted to the multi-objective case. These objectives range over ω-regular specifications including LTL [26,27], expected (discounted and non-discounted) total rewards [21,27,28,52,22], stepbounded and reward-bounded reachability probabilities [28,35], and—most relevant for this work—*expected long-run average (LRA) rewards* [18,11,20], also known as mean pay-offs. For the latter, all current approaches build upon linear programming (LP) which yields a theoretical time-complexity polynomial in the model size. However, in practice, LP-based methods are often outperformed by approaches based on value- or strategy iteration [28,1,42]. The LP-based approach of [27] and the iterative approach of [28] are both implemented in PRISM [45] and Storm [40]. The LP formulation of [11,20] is implemented in MultiGain [12], an extension of PRISM for multi-objective LRA rewards.

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 230–249, 2021. https://doi.org/10.1007/978-3-030-72016-2\_13

*Contributions of this paper* We present a computationally efficient procedure for multi-objective model checking of LRA reward and total reward objectives as well as their mixture. The crux of our procedure is *a generalization* of Forejt *et al.*'s iterative approach [28] for total rewards on MDPs *to expected LRA reward objectives*. In fact, our approach supports the arbitrary *mixtures* of expected LRA and total reward objectives. To our knowledge, such mixtures have not been considered so far. Experiments on various benchmarks using a prototypical implementation in Storm indicate that this generalized iterative algorithm outperforms the LP approach implemented in MultiGain.

In addition, we extend this approach towards *Markov automata* (MA) [25,23], a continuous-time variant of MDP that is amenable to compositional modeling. This model is well-suited, among others, to provide a formal semantics for dynamic fault trees and generalized stochastic Petri nets [24]. Our multiobjective LRA approach for MA builds upon the value-iteration approach for single-objective expected LRA rewards on MA [17] which—on practical models outperforms the LP-based approach of [30]. To the best of our knowledge, this is the *first multi-objective expected LRA reward approach for MA*. Experimental results on MA benchmarks show that the treatment of a continuous-time variant of LRA comes at almost no time penalty compared to the MDP setting.

*Other related work* Mixtures of various other objectives have been considered for MDPs. This includes conditional expectations or ratios of reward functions [5,4]. [31] considers LTL formulae with probability thresholds while maximizing an expected LRA reward. [35,41] address multi-objective quantiles on reachability properties while [50,20] consider multi-objective combinations of percentile queries on MDP and LRA objectives. [6] treats resilient systems ensuring constraints on the repair mechanism while maximizing the expected LRA reward when being operational. The trade-off between expected LRA rewards and their variance is analyzed in [13]. [33] studies multiple objectives on interval MDP, where transition probabilities can be specified as intervals in cases where the concrete probabilities are unknown. Multiple LRA reward objectives for *stochastic games* have been treated using LP [19] and value iteration over convex sets [8,9]; the latter is included in PRISM-games [44,43]. These approaches can also be applied to MDPs when viewed as one-player stochastic games. Algorithms for single-objective model checking of MA deal with objectives such as expected total rewards, time-bounded reachability probabilities, and expected long-run average rewards [38,29,30,15]. The only multi-objective approach for MA so far [47] shows that any method for multi-objective MDP can be applied on (a discretized version of) an MA for queries involving unbounded or time-bounded reachability probabilities and expected total rewards, but no long-run average rewards.

### 2 Preliminaries

The set of *probability distributions* over a finite set Ω is given by Dist(Ω) = {μ: <sup>Ω</sup> <sup>→</sup> [0, 1] <sup>|</sup> <sup>ω</sup>∈<sup>Ω</sup> <sup>μ</sup>(ω)=1}. For a distribution <sup>μ</sup> <sup>∈</sup> Dist(Ω) we let supp(μ) = {<sup>ω</sup> <sup>∈</sup> <sup>Ω</sup> <sup>|</sup> <sup>μ</sup>(ω) <sup>&</sup>gt; <sup>0</sup>} denote its support. <sup>μ</sup> is *Dirac* if <sup>|</sup>supp(μ)<sup>|</sup> = 1.

Let <sup>R</sup>≥<sup>0</sup> <sup>=</sup> {<sup>x</sup> <sup>∈</sup> <sup>R</sup> <sup>|</sup> <sup>x</sup> <sup>≥</sup> <sup>0</sup>}, <sup>R</sup>><sup>0</sup> <sup>=</sup> {<sup>x</sup> <sup>∈</sup> <sup>R</sup> <sup>|</sup> x > <sup>0</sup>}, and <sup>R</sup>¯ <sup>=</sup> <sup>R</sup> ∪ {−∞, ∞} denote the non-negative, positive, and extended real numbers, respectively. For a point **<sup>p</sup>** <sup>=</sup> p1,...,p ∈ <sup>R</sup>, <sup>∈</sup> <sup>N</sup> and <sup>i</sup> ∈ {1,...,} we write **<sup>p</sup>**<sup>i</sup> for its <sup>i</sup> th entry <sup>p</sup>i. For **<sup>p</sup>**, **<sup>q</sup>** <sup>∈</sup> <sup>R</sup> let **<sup>p</sup>**·**<sup>q</sup>** denote the dot product. We further write **<sup>p</sup>** <sup>≤</sup> **<sup>q</sup>** iff <sup>∀</sup> <sup>i</sup>: **<sup>p</sup>**<sup>i</sup> <sup>≤</sup> **<sup>q</sup>**<sup>i</sup> and **<sup>p</sup> <sup>q</sup>** iff **<sup>p</sup>** <sup>≤</sup> **<sup>q</sup>** <sup>∧</sup> **<sup>p</sup>** <sup>=</sup> **<sup>q</sup>**. The *closure* of a set <sup>P</sup> <sup>⊆</sup> <sup>R</sup> is the union of P and its boundary, denoted by cl(P). The *convex hull* of P is given by conv(P) = " <sup>i</sup>=1 <sup>μ</sup>(i) · **<sup>p</sup>**<sup>i</sup> <sup>|</sup> <sup>μ</sup> <sup>∈</sup> Dist({<sup>1</sup> ,...,}), **<sup>p</sup>**1,..., **<sup>p</sup>** <sup>∈</sup> <sup>P</sup> # . The *downward convex hull* of P is given by dwconv(P) = \$ **<sup>q</sup>** <sup>∈</sup> <sup>R</sup> | ∃ **<sup>p</sup>** <sup>∈</sup> conv(P): **<sup>q</sup>** <sup>≤</sup> **<sup>p</sup>** % .

### 2.1 Markov Automata

Markov automata (MA) [25,23] provide an expressive formalism that allows one to model exponentially distributed delays, nondeterminism, probabilistic branching, and instantaneous (undelayed) transitions.

Definition 1. *<sup>A</sup>* Markov Automaton *is a tuple* <sup>M</sup> <sup>=</sup> S, Act, Δ, **<sup>P</sup>** *where* <sup>S</sup> *is a finite set of states,* Act *is a finite set of actions,* <sup>Δ</sup>: <sup>S</sup> <sup>→</sup> <sup>R</sup>><sup>0</sup>∪2Act *is a transition function assigning exit rates to Markovian states* MS<sup>M</sup> <sup>=</sup> {<sup>s</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>Δ</sup>(s) <sup>∈</sup> <sup>R</sup>><sup>0</sup>} *and sets of enabled actions to probabilistic states* PS<sup>M</sup> <sup>=</sup> {<sup>s</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>Δ</sup>(s) <sup>⊆</sup> Act}*, and* **<sup>P</sup>**: MS<sup>M</sup> <sup>∪</sup> SA<sup>M</sup> <sup>→</sup> Dist(S) *with* SA<sup>M</sup> <sup>=</sup> {s, α ∈ PS <sup>×</sup> Act <sup>|</sup> <sup>α</sup> <sup>∈</sup> <sup>Δ</sup>(s)} *is a probability function that assigns a distribution over possible successor states for each Markovian state and enabled state-action pair.*

Let <sup>M</sup> <sup>=</sup> S, Act, Δ, **<sup>P</sup>** be an MA. If <sup>M</sup> is clear from the context, we may omit the superscript from MSM, PSM, SAM, and further notations introduced below. Intuitively, the time <sup>M</sup> stays in a Markovian state <sup>s</sup> <sup>∈</sup> MS is governed by an *exponential distribution* with rate <sup>Δ</sup>(s) <sup>∈</sup> <sup>R</sup>>0, i.e., the probability to take a transition from <sup>s</sup> within <sup>t</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> time units is <sup>1</sup> <sup>−</sup> <sup>e</sup>−Δ(s)·<sup>t</sup> . Upon taking a transition, a successor state <sup>s</sup> <sup>∈</sup> <sup>S</sup> is drawn from the distribution **<sup>P</sup>**(s), i.e., **<sup>P</sup>**(s)(s ) is the probability that the transition leads to <sup>s</sup> <sup>∈</sup> <sup>S</sup>. For probabilistic states <sup>s</sup><sup>ˆ</sup> <sup>∈</sup> PS , an enabled action <sup>α</sup> <sup>∈</sup> <sup>Δ</sup>(ˆs) has to be picked and a successor state is drawn from **<sup>P</sup>**(s, α <sup>ˆ</sup> ) (without any delay). Nondeterminism is thus only possible at probabilistic states. We assume deadlock free MA, i.e., <sup>∀</sup> <sup>s</sup> <sup>∈</sup> PS<sup>M</sup> : <sup>Δ</sup>(s) <sup>=</sup> <sup>∅</sup>.

*Remark 1.* To enable more flexible modeling such as parallel compositions, the literature (e.g., [25,30]) often considers a more liberal variant of MA where (i) different successor distributions can be assigned to the same state-action pair and (ii) states can be both, Markovian *and* probabilistic. MAs as in Definition 1 also known as closed MA—are equally expressive: they can be constructed via action renaming and by applying the so-called *maximal progress assumption* [25].

An *infinite path* in <sup>M</sup> is a sequence <sup>π</sup> <sup>=</sup> <sup>s</sup>0κ1s1κ<sup>2</sup> ... where for each <sup>i</sup> <sup>≥</sup> <sup>0</sup> either <sup>s</sup><sup>i</sup> <sup>∈</sup> MS , <sup>κ</sup>i+1 <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>, and **<sup>P</sup>**(si)(si+1) <sup>&</sup>gt; <sup>0</sup> or <sup>s</sup><sup>i</sup> <sup>∈</sup> PS , <sup>κ</sup>i+1 <sup>∈</sup> <sup>Δ</sup>(si), and **<sup>P</sup>**(si, κi+1)(si+1) <sup>&</sup>gt; <sup>0</sup>. Intuitively, if <sup>s</sup><sup>i</sup> is Markovian, <sup>κ</sup>i+1 <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> reflects the time we have stayed in s<sup>i</sup> until transitioning to si+1. If s<sup>i</sup> is probabilistic, <sup>κ</sup>i+1 <sup>∈</sup> Act is the performed action via which we transition to <sup>s</sup>i+1. A finite path πˆ = s0κ1s1κ<sup>2</sup> ...κns<sup>n</sup> is a finite prefix of an infinite path π. We set last(ˆπ) = s<sup>n</sup> and <sup>|</sup>πˆ<sup>|</sup> <sup>=</sup> <sup>n</sup> for finite <sup>π</sup><sup>ˆ</sup> and <sup>|</sup>π<sup>|</sup> <sup>=</sup> <sup>∞</sup> for infinite <sup>π</sup>. For (finite or infinite) path π¯ = s0κ1s1κ<sup>2</sup> ... let dur (¯π) = |π¯<sup>|</sup> <sup>i</sup>=1 dur (κi) be the total duration of <sup>π</sup>¯ where dur (κ) = <sup>κ</sup> if <sup>κ</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> and <sup>0</sup> otherwise. If <sup>π</sup>¯ is infinite and dur (¯π) <sup>&</sup>lt; <sup>∞</sup>, the path is called *Zeno*. For <sup>k</sup> <sup>∈</sup> <sup>N</sup> with <sup>k</sup> ≤ |π¯<sup>|</sup> we let prefix steps (¯π, k) denote the unique prefix <sup>π</sup> of <sup>π</sup>¯ with <sup>|</sup>π <sup>|</sup> <sup>=</sup> <sup>k</sup> and for <sup>t</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> we let prefix time (¯π, t) denote the largest prefix of π¯ with total duration at most t. The sets of infinite and finite paths of M are given by Paths<sup>M</sup> inf and Paths<sup>M</sup> fin, respectively.

<sup>A</sup> *component* of <sup>M</sup> is a set <sup>C</sup> <sup>⊆</sup> MS <sup>∪</sup> SA. We set states(C)=(<sup>C</sup> <sup>∩</sup> MS ) <sup>∪</sup> {<sup>s</sup> <sup>∈</sup> PS | ∃ <sup>α</sup>: s, α ∈ <sup>C</sup>}. <sup>C</sup> is *closed* if <sup>∀</sup> <sup>c</sup> <sup>∈</sup> <sup>C</sup> : supp(**P**(c)) <sup>⊆</sup> states(C) and *connected* if for all s, s <sup>∈</sup> states(C) there is <sup>s</sup>0κ<sup>1</sup> ...κns<sup>n</sup> <sup>∈</sup> Pathsfin with <sup>s</sup> <sup>=</sup> <sup>s</sup>0, <sup>s</sup> <sup>=</sup> <sup>s</sup>n, and for each <sup>i</sup> <sup>≥</sup> <sup>0</sup> either <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>C</sup> <sup>∩</sup> MS or si, κi+1 ∈ <sup>C</sup> <sup>∩</sup> SA. An *end component (EC)* is a closed and connected component. An EC is *maximal* if it is not a proper subset of another EC. MECS(M) denotes the maximal ECs of M. For an EC C let exits(C) = \$ s, α ∈ SA<sup>M</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> states(C) and s, α <sup>∈</sup>/ <sup>C</sup> % .

Definition 2. *The* sub-MA *of* <sup>M</sup> *induced by a closed component* <sup>C</sup> *is given by* M-<sup>C</sup> <sup>=</sup> states(C), Act, Δ<sup>C</sup> , **<sup>P</sup>**<sup>C</sup> *where* <sup>Δ</sup><sup>C</sup> (s) = <sup>Δ</sup>(s) *if* <sup>s</sup> <sup>∈</sup> <sup>C</sup> <sup>∩</sup> MS<sup>M</sup> *and otherwise* <sup>Δ</sup><sup>C</sup> (s) = {<sup>α</sup> <sup>∈</sup> <sup>Δ</sup>(s) | s, α ∈ <sup>C</sup>}*, and* **<sup>P</sup>**<sup>C</sup> *is the restriction of* **<sup>P</sup>** *to* <sup>C</sup>*.*

A *strategy* for M resolves the nondeterminism at probabilistic states by providing probability distributions over enabled actions based on the execution history.

Definition 3. *A (general)* strategy *for MA* <sup>M</sup> <sup>=</sup> S, Act, Δ, **<sup>P</sup>** *is a function* <sup>σ</sup> : Pathsfin <sup>→</sup> Dist(Act) ∪ {τ} *such that for* <sup>π</sup><sup>ˆ</sup> <sup>∈</sup> Pathsfin *we have* <sup>σ</sup>(ˆπ) <sup>∈</sup> Dist(Δ(last(πˆ))) *if* last(ˆπ) <sup>∈</sup> PS *and* <sup>σ</sup>(ˆπ) = <sup>τ</sup> *otherwise.*

A strategy σ is called *memoryless* if the choice only depends on the current state, i.e., <sup>∀</sup> π, <sup>ˆ</sup> <sup>π</sup>ˆ <sup>∈</sup> Pathsfin : last(ˆπ) = last(ˆπ ) implies σ(ˆπ) = σ(ˆπ ). If all assigned distributions are Dirac, σ is called *deterministic*. Let Σ<sup>M</sup> and Σ<sup>M</sup> md denote the set of general and memoryless deterministic strategies of M, respectively. For simplicity, we often interpret <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup> md as a function <sup>σ</sup> : <sup>S</sup> <sup>→</sup> Act ∪ {τ}. The *induced sub-MA* for <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup> md is given by <sup>M</sup>- MS ∪ {s, σ(s) | <sup>s</sup> <sup>∈</sup> PS } . Strategy <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup> and initial state <sup>s</sup>I <sup>∈</sup> <sup>S</sup> define a *probability measure* PrM,s<sup>I</sup> <sup>σ</sup> that assigns probabilities to sets of infinite paths [38]. The expected value of <sup>f</sup> : Pathsinf <sup>→</sup> <sup>R</sup>¯ is given by the Lebesque integral ExM,s<sup>I</sup> <sup>σ</sup> (f) = ! <sup>π</sup>∈Pathsinf f(π) dPrM,sI <sup>σ</sup> (π).

#### 2.2 Reward-based Objectives

MA can be equipped with *rewards* to model various quantities like, e.g., energy consumption or the number of produced units. We distinguish between *transition* rewards <sup>R</sup>trans : MS <sup>∪</sup> SA <sup>×</sup> <sup>S</sup> <sup>→</sup> <sup>R</sup> that are collected when transitioning from one state to another and *state* rewards <sup>R</sup>state : <sup>S</sup> <sup>→</sup> <sup>R</sup> that are collected over time, i.e., staying in state <sup>s</sup> for <sup>t</sup> time units yields a reward of <sup>R</sup>state(s)·t. Since no time passes in probabilistic states, state rewards <sup>R</sup>state(s) for <sup>s</sup> <sup>∈</sup> PS are not relevant. A reward assignment combines the two notions.

Definition 4. *<sup>A</sup>* reward assignment *for MA* <sup>M</sup> *and* <sup>R</sup>state, <sup>R</sup>trans *as above is a function* <sup>R</sup>: (MS <sup>×</sup> <sup>R</sup>≥0) <sup>∪</sup> SA <sup>×</sup> <sup>S</sup> <sup>→</sup> <sup>R</sup> *with*

$$\mathcal{R}(\langle s,\kappa\rangle,s') = \begin{cases} \mathcal{R}\_{\text{state}}(s)\cdot\kappa + \mathcal{R}\_{\text{trans}}(s,s') & \text{if } s \in MS, \kappa \in \mathbb{R}\_{\ge 0} \\ \mathcal{R}\_{\text{trans}}(\langle s,\kappa\rangle,s') & \text{if } s \in PS, \kappa \in \Delta(s). \end{cases}$$

We fix a reward assignment R for M. R can also be applied to any sub-MA M-<sup>C</sup> of <sup>M</sup> in a straightforward way. For a component <sup>C</sup> <sup>⊆</sup> MS <sup>∪</sup> SA we write <sup>R</sup>(C) <sup>≥</sup> <sup>0</sup> if all rewards assigned within <sup>C</sup> are non-negative, formally ∀ s, κ ∈ (<sup>C</sup> <sup>∩</sup> SA) <sup>∪</sup> ((<sup>C</sup> <sup>∩</sup> MS ) <sup>×</sup> <sup>R</sup>≥<sup>0</sup>): <sup>∀</sup> <sup>s</sup> <sup>∈</sup> states(C): <sup>R</sup>(C, κ, s ) ≥ 0. The shortcuts <sup>R</sup>(C) <sup>≤</sup> <sup>0</sup> and <sup>R</sup>(C)=0 are similar. The reward of a finite path <sup>π</sup><sup>ˆ</sup> <sup>=</sup> <sup>s</sup>0κ1s1κ<sup>2</sup> ...κns<sup>n</sup> is denoted by <sup>R</sup>(ˆπ) = |π¯<sup>|</sup> <sup>i</sup>=1 <sup>R</sup>(s<sup>i</sup>−1, κi, si).

Definition 5. *The* total reward objective *for reward assignment* R *is given by* tot(R): Pathsinf <sup>→</sup> <sup>R</sup>¯ *with* tot(R)(π) = lim sup<sup>k</sup>→∞ <sup>R</sup>(prefix steps (π, k))*.*

Definition 6. *The* long-run average (LRA) reward objective *for* R *is given by* lra(R): Pathsinf <sup>→</sup> <sup>R</sup>¯ *with* lra(R)(π) = lim sup<sup>t</sup>→∞ <sup>1</sup> <sup>t</sup> · R(prefix time (π, t))*.*

Sect. 4 considers assumptions under which the limit in both definitions can be attained, i.e., lim sup can be replaced by lim. The incorporation of other objectives such as *reachability probabilities* are discussed in Remark 3.

### 2.3 Markov Decision Processes

A *Markov Decision Process (MDP)* M is an MA with only probabilistic states, i.e., MS<sup>M</sup> = ∅. All notions above also apply to MDP. However, since all paths of an MDP have duration 0, there is no timing information available. For MDP, we therefore usually consider *steps* instead of time. In particular, for reward assignment <sup>R</sup> we consider lrasteps (R) instead of lra(R), where lrasteps (R)(π) = lim sup<sup>k</sup>→∞ <sup>1</sup> <sup>k</sup> · R(prefix steps (π, k)). Below, we focus on MA. Applying our results to step-based LRA rewards on MDPs is straightforward. Time-based LRA reward objectives for MA can not straightforwardly be reduced to step-based measures for MDP due to the interplay of delayed- and undelayed transitions.

### 3 Efficient Multi-objective Model Checking

We formalize common tasks in multi-objective model checking and sketch our solution method based on [28]. We fix an MA <sup>M</sup> <sup>=</sup> S, Act, Δ, **<sup>P</sup>** with initial state s<sup>I</sup> <sup>∈</sup> <sup>S</sup> and > <sup>0</sup> objectives <sup>f</sup>1,...,f : Pathsinf <sup>→</sup> <sup>R</sup> with <sup>F</sup> <sup>=</sup> f1,...,f. The notation for expected values is lifted to tuples: Exσ(F) = Exσ(f1),...,Exσ(<sup>f</sup>).

### 3.1 Multi-objective Model Checking Queries

Our aim is to maximize the expected value for each (potentially conflicting) objective f<sup>j</sup> . We impose the following assumption which can be asserted using single-objective model checking. We further discuss the assumption in Remark 2.

(a) MA M with rewards R1, R<sup>2</sup> (b) *Ach* (F) (green) and *Pareto*(F) (blue)

Figure 1: MA with achievable points and Pareto front for <sup>F</sup> <sup>=</sup> lra(R1), tot(R2)

# Assumption 1 (Objective Finiteness) <sup>∀</sup> <sup>j</sup> : sup {Exσ(f<sup>j</sup> ) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>} <sup>&</sup>lt; <sup>∞</sup>*.*

Definition 7. *For* <sup>F</sup> *as above,* Ach(F) = \$ **<sup>p</sup>** <sup>∈</sup> <sup>R</sup> | ∃ <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> : **<sup>p</sup>** <sup>≤</sup> Exσ(F) % *is the set of* achievable points*. The* Pareto front *is given by* Pareto(F) = {**<sup>p</sup>** <sup>∈</sup> cl(Ach(F)) | ∀ **<sup>p</sup> <sup>p</sup>**: **<sup>p</sup>** <sup>∈</sup>/ cl(Ach(F))} .

A point **<sup>p</sup>** <sup>∈</sup> Ach(F) is called *achievable* and there is a single strategy <sup>σ</sup> that for each objective <sup>f</sup><sup>j</sup> *achieves* an expected value of at least **<sup>p</sup>**<sup>j</sup>. Due to Assumption 1, the Pareto front is the *frontier* of the set of achievable points, meaning that it is the smallest set <sup>P</sup> <sup>⊆</sup> <sup>R</sup> with dwconv(P) = cl(Ach(F)). We can thus interpret Pareto(F) as a representation for cl(Ach(F)) and vice versa. The set of achievable points is closed iff all points on the Pareto front are achievable.

*Example 1.* Fig. 1a shows an MA with initial state s3. Transitions are annotated with actions, rates (boldfaced), and successor probabilities. We also depict two reward assignments R<sup>1</sup> and R<sup>2</sup> by labeling states and transitions with tuples r1, r2 where, e.g., <sup>R</sup>2(s3, α, s1) = <sup>−</sup><sup>1</sup> and for <sup>t</sup> <sup>∈</sup> <sup>R</sup>≥0: <sup>R</sup>1(s2, t, s4)=6 · <sup>t</sup>.

For <sup>σ</sup><sup>1</sup> <sup>∈</sup> <sup>Σ</sup>md with <sup>σ</sup><sup>1</sup> : <sup>s</sup>3, s<sup>4</sup> <sup>→</sup> <sup>α</sup>, the EC {s2,s4, α,s4, β, s6} is reached almost surely (with probability 1), yielding Ex<sup>σ</sup><sup>1</sup> (lra(R1)) = 0.<sup>6</sup> · 6+0.<sup>4</sup> · 1=4 and Ex<sup>σ</sup><sup>1</sup> (tot(R2)) = <sup>∞</sup> <sup>i</sup>=0 <sup>−</sup><sup>1</sup> · (0.5)<sup>i</sup> <sup>=</sup> <sup>−</sup>2. It follows that the point **<sup>p</sup>**<sup>1</sup> <sup>=</sup> 4, <sup>−</sup>2 as indicated in Fig. 1b is achievable. Similarly, <sup>σ</sup><sup>2</sup> <sup>∈</sup> <sup>Σ</sup>md with <sup>σ</sup><sup>2</sup> : <sup>s</sup><sup>3</sup> <sup>→</sup> β,s<sup>4</sup> <sup>→</sup> <sup>α</sup> achieves the point **<sup>p</sup>**<sup>2</sup> <sup>=</sup> 3, <sup>0</sup>. With strategies that randomly pick an action at s3, we can also achieve any point on the blue line in Fig. 1b that connects **p**<sup>1</sup> and **p**2. This line coincides with the Pareto front Pareto(F) for <sup>F</sup> <sup>=</sup> lra(R1), tot(R2). The set of achievable points Ach(F) (indicated in green) coincides with the downward convex hull of the Pareto front.

For multi-objective model checking we are concerned with the following queries:


Input : MA <sup>M</sup> with initial state <sup>s</sup>I , objectives <sup>F</sup> <sup>=</sup> <sup>f</sup>1,...,f Output : An approximation of *Ach* (F) <sup>1</sup> P ← ∅ *// Collects achievable points found so far.* <sup>2</sup> <sup>Q</sup> <sup>←</sup> <sup>R</sup> *// Excludes points that are known to be unachievable.* 3 repeat <sup>4</sup> Select weights **<sup>w</sup>** ∈ {**w** <sup>∈</sup> (R≥<sup>0</sup>) | <sup>j</sup>=1 **w** <sup>j</sup> = 1} and ε > <sup>0</sup> <sup>5</sup> Find v**<sup>w</sup>** ≥ sup {**w** · Exσ(F) | σ ∈ Σ}, σ**<sup>w</sup>** ∈ Σ s.t. |v**<sup>w</sup>** − **w** · Ex<sup>σ</sup>**<sup>w</sup>** (F)| ≤ ε <sup>6</sup> Compute **<sup>p</sup><sup>w</sup>** <sup>∈</sup> <sup>R</sup> with <sup>∀</sup> <sup>j</sup> : **<sup>p</sup><sup>w</sup>**<sup>j</sup> = Ex<sup>σ</sup>**<sup>w</sup>** (f<sup>j</sup> )

<sup>7</sup> <sup>P</sup> <sup>←</sup> <sup>P</sup> ∪ {**pw**}; <sup>Q</sup> <sup>←</sup> <sup>Q</sup> <sup>∩</sup> **<sup>p</sup>** <sup>∈</sup> <sup>R</sup> <sup>|</sup> **<sup>w</sup>** · **<sup>p</sup>** <sup>≤</sup> <sup>v</sup>**<sup>w</sup>** 

<sup>8</sup> until Approximation *dwconv*(P) ⊆ *Ach* (F) ⊆ Q answers multi-obj. query

Algorithm 1: Approximating the set of achievable points

### 3.2 Approximation of Achievable Points

A practically efficient approach that tackles the above queries for expected total rewards in MDP was given in [28]. It is based on so-called *sandwich algorithms* known from convex multi-objective optimization [53,51]. We extend the algorithm to arbitrary combinations of objectives f<sup>j</sup> on MA, including—and this is the main algorithmic novelty—mixtures of total- and LRA reward objectives.

The idea is to iteratively refine an approximation of the set of achievable points Ach(F). The refinement loop is outlined in Algorithm 1. At the start of each iteration, the algorithm chooses a weight vector **w** and a precision parameter ε after some heuristic (details below). Then, Line 5, considers the weighted sum of the expected values of the objectives f<sup>j</sup> . More precisely, an upper bound v**<sup>w</sup>** for sup {**<sup>w</sup>** · Exσ(F) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>} as well as a "near optimal" strategy <sup>σ</sup>**<sup>w</sup>** need to be found such that the difference between the bound v**<sup>w</sup>** and the weighted sum induced by σ**<sup>w</sup>** is at most ε. In Sect. 4, we outline the computation of v**<sup>w</sup>** and σ**<sup>w</sup>** for the case where F consists of total-and LRA reward objectives. Next, in Line 6 the algorithm computes a point **p<sup>w</sup>** that contains the expected values for each individual objective f<sup>j</sup> under strategy σ**w**. These values can be computed using off-the-shelf single-objective model checking algorithms on the model induced by σ**w**. By definition, **p<sup>w</sup>** is achievable. Finally, Line 7 inserts the found point into the initially empty set P and excludes points from the set Q (which initially contains all points) that are known to be unachievable. The following theorem establishes the correctness of the approach. We prove it using Lemmas 1 and 2.

Theorem 1. *Algorithm <sup>1</sup> maintains the invariant* dwconv(P) <sup>⊆</sup> Ach(F) <sup>⊆</sup> <sup>Q</sup>*.*

Lemma 1. <sup>∀</sup> **<sup>p</sup>** <sup>∈</sup> Ach(F), **<sup>w</sup>** <sup>∈</sup> (R≥<sup>0</sup>) : **<sup>w</sup>** · **<sup>p</sup>** <sup>≤</sup> sup {**<sup>w</sup>** · Exσ(F) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>}*.*

*Proof.* Let **<sup>p</sup>** <sup>∈</sup> Ach(F) be achieved by strategy <sup>σ</sup>**<sup>p</sup>** <sup>∈</sup> <sup>Σ</sup>. The claim follows from

$$\mathbf{w} \cdot \mathbf{p} = \sum\_{j=1}^{\ell} \mathbf{w}[j] \cdot \mathbf{p}[j] \le \sum\_{j=1}^{\ell} \mathbf{w}[j] \cdot \mathbf{Ex}\_{\sigma\_{\mathbf{p}}}(f\_j) \le \sup \left\{ \sum\_{j=1}^{\ell} \mathbf{w}[j] \cdot \mathbf{Ex}\_{\sigma}(f\_j) \, \Big| \, \sigma \in \Sigma \right\}.$$

Lemma 2. Ach(F) *is convex, i.e.,* Ach(F) = conv(Ach(F))*.*

*Proof.* We need to show that for two points **<sup>p</sup>**1, **<sup>p</sup>**<sup>2</sup> <sup>∈</sup> Ach(F) with achieving strategies <sup>σ</sup>1, σ<sup>2</sup> <sup>∈</sup> <sup>Σ</sup>, any point **<sup>p</sup>** on the line connecting **<sup>p</sup>**<sup>1</sup> and **<sup>p</sup>**<sup>2</sup> is also achievable. Formally, for <sup>w</sup> <sup>∈</sup> [0, 1] show that **<sup>p</sup>**<sup>w</sup> <sup>=</sup> <sup>w</sup>·**p**1+ (1−w)·**p**<sup>2</sup> <sup>∈</sup> Ach(F). Consider the strategy σ<sup>w</sup> that initially makes a coin flip<sup>1</sup>: With probability w it mimics σ<sup>1</sup> and otherwise it mimics σ2. We can show for all objectives f<sup>j</sup> :

$$\mathbf{p\_w}[j] = w \cdot \mathbf{p\_1}[j] + (1 - w) \cdot \mathbf{p\_2}[j] \le w \cdot \text{Ex}\_{\sigma\_1}(f\_j) + (1 - w) \cdot \text{Ex}\_{\sigma\_2}(f\_j) = \text{Ex}\_{\sigma\_w}(f\_j).$$

We now show Theorem 1. A similar proof was given in [28].

*Proof (of Theorem 1).* All **<sup>p</sup><sup>w</sup>** <sup>∈</sup> <sup>P</sup> are achievable, i.e., <sup>P</sup> <sup>⊆</sup> Ach(F). By Definition <sup>7</sup> and Lemma <sup>2</sup> we get dwconv(P) <sup>⊆</sup> dwconv(Ach(F)) = conv(Ach(F)) = Ach(F). Now let **p** ∈ Ach(F) and let **w** be an arbitrary weight vector considered in some iteration of Algorithm 1 with corresponding value v**<sup>w</sup>** computed in Line 5. Lemma <sup>1</sup> yields **<sup>w</sup>** ·**<sup>p</sup>** <sup>≤</sup> sup {**<sup>w</sup>** · Exσ(F) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>} ≤ <sup>v</sup>**<sup>w</sup>** and thus **<sup>p</sup>** <sup>∈</sup> <sup>Q</sup>. Algorithm 1 can be stopped at any time and the current approximation of Ach(F) can be used to (i) decide qualitative achievability, (ii) provide a lower and an upper bound for quantitative achievability, and (iii) obtain an approximative representation of the Pareto front.

The *precision parameter* ε can be decreased dynamically to obtain a gradually finer approximation. If Ach(F) is closed, the supremum sup {**<sup>w</sup>** · Exσ(F) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>} can be attained by some strategy σ**w**, allowing us to set ε = 0.

We briefly sketch the *selection of weight vectors* as proposed in [28]. In the first iterations of Algorithm 1, we optimize each objective f<sup>j</sup> individually, i.e., we consider for all <sup>j</sup> the weight vector **<sup>w</sup>** with **<sup>w</sup>**<sup>i</sup> = 0 for <sup>i</sup> <sup>=</sup> <sup>j</sup> and **<sup>w</sup>**<sup>j</sup> = 1. After that, we consider weight vectors that are orthogonal to a facet of the downward convex hull of the current set of points P. To approximate the Pareto front, facets with a large distance to <sup>R</sup> \ <sup>Q</sup> are considered first. To answer a qualitative or quantitative achievability query, the selection can be guided further based on the input point **<sup>p</sup>** <sup>∈</sup> <sup>R</sup> or the input values <sup>p</sup>2, p3,...,p <sup>∈</sup> <sup>R</sup>. More details and further discussions on these heuristics can be found in [28].

*Remark 2.* Assumption <sup>1</sup> does not exclude Exσ(f<sup>j</sup> ) = −∞ which occurs, e.g., when objectives reflect resource consumption and some (bad) strategies require infinite resources. Moreover, if Assumption 1 is violated for an objective f<sup>j</sup> we observe that for this objective, any (arbitrarily high) value <sup>p</sup> <sup>∈</sup> <sup>R</sup> can be achieved with some strategy <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> such that <sup>p</sup> <sup>≤</sup> Exσ(f<sup>j</sup> ). Similar to the proof of Lemma 2, a strategy can be constructed that—with a small probability mimics a strategy inducing a very high expected value for f<sup>j</sup> and—with the remaining (high) probability—optimizes for the other objectives. Let F−<sup>j</sup> be the tuple <sup>F</sup> without <sup>f</sup><sup>j</sup> and similarly for **<sup>p</sup>** <sup>∈</sup> <sup>R</sup> let **<sup>p</sup>**−<sup>j</sup> <sup>∈</sup> <sup>R</sup> <sup>−</sup><sup>1</sup> be the point **<sup>p</sup>** without the <sup>j</sup>th entry. Assuming inf {Exσ(f<sup>j</sup> ) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>} <sup>&</sup>gt; −∞, we can show that cl(Ach(F)) = \$ **<sup>p</sup>** <sup>∈</sup> <sup>R</sup> <sup>|</sup> **<sup>p</sup>**−<sup>j</sup> <sup>∈</sup> cl(Ach(F−<sup>j</sup> ))% . Put differently, cl(Ach(F)) can be constructed from the achievable points obtained without the objective f<sup>j</sup> .

<sup>1</sup> Strategies as in Definition 3 can not "store" the outcome of the initial coin flip. Thus, given πˆ ∈ *Paths*fin, strategy σ<sup>w</sup> actually has to consider the *conditional* probability for the outcome of the coin flip, given that πˆ has been observed. Alternatively, we could have also introduced strategies with memory.

### 4 Optimizing Weighted Combinations of Objectives

We now analyze weighted sums of expected values as in Line 5 of Algorithm 1.

Weighted Sum Optimi ation Problem Input: MA <sup>M</sup> with initial state <sup>s</sup>I , objectives <sup>F</sup> <sup>=</sup> f1,...,f, weight vector **<sup>w</sup>** ∈ {**w** <sup>∈</sup> (R≥0) <sup>|</sup> <sup>j</sup>=1 **w** <sup>j</sup> = 1}, precision ε > <sup>0</sup> Output: Value <sup>v</sup>**<sup>w</sup>** <sup>∈</sup> <sup>R</sup>, with <sup>v</sup>**<sup>w</sup>** <sup>≥</sup> sup {**<sup>w</sup>** · Exσ(F) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>} and strategy <sup>σ</sup>**<sup>w</sup>** <sup>∈</sup> <sup>Σ</sup> such that <sup>|</sup>v**<sup>w</sup>** <sup>−</sup> **<sup>w</sup>** · Ex<sup>σ</sup>**<sup>w</sup>** (F)| ≤ <sup>ε</sup>.

We only consider total- and LRA reward objectives. Remark 3 discusses other objectives. We show that instead of a weighted sum of the expected values we can consider weighted sums of the rewards. This allows us to combine all objectives into a single reward assignment and then apply single-objective model checking.

### 4.1 Pure Long-run Average Queries

Initially, we restrict ourselves to LRA objectives and show a reduction of the weighted sum optimization problem to a single-objective long-run average reward computation. As usual for MA [38,29,17] we forbid so-called Zeno behavior.

Assumption 2 (Non-Zenoness) <sup>∀</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup> : PrM<sup>σ</sup> ({<sup>π</sup> <sup>|</sup> dur (π) <sup>&</sup>lt; ∞})=0*.*

The assumption is equivalent to assuming that every EC of M contains at least one Markovian state. If the assumption holds, the limit in Definition 6 can be attained almost surely (with probability 1) and corresponds to a value <sup>v</sup> <sup>∈</sup> <sup>R</sup>. Thus, Assumption 1 for LRA objectives is already implied by Assumption 2. Let <sup>F</sup>lra <sup>=</sup> lra(R1),..., lra(<sup>R</sup>) with reward assignments <sup>R</sup><sup>j</sup> . Moreover, for weight vector **<sup>w</sup>** let <sup>R</sup>**<sup>w</sup>** be the reward assignment with <sup>R</sup>**<sup>w</sup>**(s, κ, s ) =  <sup>j</sup>=1 **<sup>w</sup>**j · <sup>R</sup><sup>j</sup> (s, κ, s ).

Theorem 2. <sup>∀</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> : **<sup>w</sup>** · Exσ(Flra ) = Exσ(lra(R**<sup>w</sup>**))*.*

*Proof.* Due to Assumption <sup>2</sup> we have for almost all paths <sup>π</sup> <sup>∈</sup> Pathsinf that for all <sup>j</sup> ∈ {1,...,} the limit lim<sup>t</sup>→∞ <sup>1</sup> <sup>t</sup> · R<sup>j</sup> (prefix time (π, t)) exists and

$$\sum\_{j=1}^{\ell} \mathbf{w}[j] \cdot \ln(\mathcal{R}\_j)(\pi) = \lim\_{t \to \infty} \frac{1}{t} \cdot \sum\_{j=1}^{\ell} \mathbf{w}[j] \cdot \mathcal{R}\_j(prefix\_{time}(\pi, t)) = \ln(\mathcal{R}\_\mathbf{w})(\pi).$$

The theorem follows with

$$\sum\_{j=1}^{\ell} \mathbf{w}[j] \cdot \operatorname{Ex}\_{\sigma}(\operatorname{lra}(\mathcal{R}\_{j})) = \int\_{\pi} \sum\_{j=1}^{\ell} \mathbf{w}[j] \cdot \operatorname{lra}(\mathcal{R}\_{j}) \operatorname{d\mathcal{P}r}\_{\sigma}(\pi) = \operatorname{Ex}\_{\sigma}(\operatorname{lra}(\mathcal{R}\_{\mathbf{w}})).$$

Due to Theorem 2, it suffices to consider the expected LRA reward for the *single* reward assignment <sup>R</sup>**<sup>w</sup>**. The supremum sup {Exσ(lra(R**<sup>w</sup>**)) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>} is attained by some memoryless deterministic strategy <sup>σ</sup>**<sup>w</sup>** <sup>∈</sup> <sup>Σ</sup>md [30]. Such a strategy and the induced value <sup>v</sup>**<sup>w</sup>** = Ex<sup>σ</sup>**<sup>w</sup>** (lra(R**<sup>w</sup>**)) can be computed (or approximated) with *linear programming* [30], *strategy iteration* [42] or *value iteration* [17,1].

#### 4.2 A Two-phase Approach for Single-objective LRA

The computation of single-objective expected LRA rewards for reward assignment R**<sup>w</sup>** can be divided in two phases [29,17,1]. First, each maximal end component <sup>C</sup> <sup>∈</sup> MECS(M) is analyzed individually by computing for sub-MA <sup>M</sup>-C and some<sup>2</sup> <sup>s</sup> <sup>∈</sup> states(C) the value <sup>v</sup><sup>C</sup> = max{ExM-<sup>C</sup>,s <sup>σ</sup> (lra(R**w**)) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>M-C md }.

Secondly, we consider a quotient model <sup>M</sup> <sup>=</sup> <sup>M</sup>\MECS(M) of <sup>M</sup> that replaces the states of each <sup>C</sup> <sup>∈</sup> MECS(M) by a single state.

$$\begin{array}{ll} \textbf{Definition 8.} \text{ For } \mathcal{M} = \langle S, \mathcal{A}ct, \Delta, \mathbf{P} \rangle \text{ and a set of } ECs \text{ } \mathcal{C}, \text{ the quotient is the} \\ \text{MA } \mathcal{M}\_{\cup \mathcal{C}} = \langle S\_{\langle \mathcal{C}}, \mathcal{A}ct\_{\langle \mathcal{C}}, \Delta\_{\mathcal{C}}, \mathbf{P}\_{\langle \mathcal{C} \rangle} \rangle \text{ where} \\ \quad - \, S\_{\langle \mathcal{C}} = \langle S \, \bigvee\_{\mathcal{C} \in \mathcal{C}} states(C) \rangle \, \forall \mathcal{C} \forall \{s\_{\perp} \}, \, Act\_{\langle \mathcal{C}} = \operatorname{Act} \, \upvee \bigvee\_{\mathcal{C} \in \mathcal{C}} \, \mathit{c}its(C) \rangle \, \forall \{\bot\}, \\ \, - \, \Delta\_{\langle \mathcal{C} \rangle} (\dot{s}) = \begin{cases} \Delta(\ddot{s}) & \text{if } \hat{s} \in S \\ \mathit{exits}(\ddot{s}) \cup \{\bot\} & \text{if } \hat{s} \in \mathcal{C} \\ 1 & \text{if } \hat{s} = s\_{\bot} \, \text{ and} \\ \end{cases} \\ \, \begin{cases} \textbf{P}(c) & \text{if } c \in MS^{\mathcal{M}} \cup SA^{\mathcal{M}} \\ \textbf{P}(\langle s, \alpha \rangle) & \text{if } c = \langle C, \langle s, \alpha \rangle \rangle \text{ for } C \in \mathcal{C} \, \mathit{ and } \langle s, \alpha \rangle \in \mathit{exits}(C) \\ \langle s\_{\bot} \mapsto 1 \rangle & \text{if } c \in \mathcal{C} \times \{\bot\} \cup \{s\_{\bot} \} \end{cases} \end{array}$$

Intuitively, selecting action <sup>⊥</sup> at a state <sup>C</sup> <sup>∈</sup> MECS(M) in <sup>M</sup> reflects any strategy of <sup>M</sup> that upon visiting the EC <sup>C</sup> will stay in this EC forever. We can thus mimic any strategy of the sub-MA <sup>M</sup>-<sup>C</sup>, in particular a memoryless deterministic strategy that maximizes the expected value of lra(R**<sup>w</sup>**) in <sup>M</sup>-C. Contrarily, selecting an action s, α at a state <sup>C</sup> of <sup>M</sup> reflects a strategy of <sup>M</sup> that upon visiting the EC <sup>C</sup> enforces that the states of <sup>C</sup> will be left via the exiting state-action pair s, α. Let <sup>R</sup><sup>∗</sup> be the reward assignment for <sup>M</sup> that yields <sup>R</sup><sup>∗</sup>(C, ⊥, s⊥) = <sup>v</sup><sup>C</sup> and 0 in all other cases. It can be shown that max{ExM,sI <sup>σ</sup> (lra(R**<sup>w</sup>**)) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup>} = max{ExM- ,s- <sup>I</sup> <sup>σ</sup> (tot(R<sup>∗</sup>)) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>M- }, where s I <sup>=</sup> <sup>C</sup><sup>I</sup> if <sup>s</sup><sup>I</sup> is contained in some <sup>C</sup><sup>I</sup> <sup>∈</sup> MECS(M) and <sup>s</sup> I <sup>=</sup> <sup>s</sup><sup>I</sup> otherwise.

The maximal total reward in M can be computed using standard techniques such as *value iteration* and *policy iteration* [46] as well as the more recent *sound value iteration* and *optimistic value iteration* [48,36]. The latter two provide sound precision guarantees for the output value <sup>v</sup>, i.e., <sup>|</sup><sup>v</sup> <sup>−</sup> max{ExM- ,s- <sup>I</sup> <sup>σ</sup> (tot(R<sup>∗</sup>)) <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>M- }| ≤ <sup>ε</sup> for a given ε > <sup>0</sup>.

#### 4.3 Combining Long-run Average and Total Rewards

We now consider arbitrary combinations of total- and long-run average reward objectives <sup>F</sup> <sup>=</sup> tot(R1),..., tot(Rk), lra(Rk+1),..., lra(<sup>R</sup>) with <sup>0</sup> <k<.

The above-mentioned procedure for LRA reduces the analysis to an expected total reward computation on the quotient model <sup>M</sup>\MECS(M). This approach suggests to also incorporate other total-reward objectives for M in the quotient

<sup>2</sup> The value v<sup>C</sup> does not depend on the selected state s. Intuitively, this is because any other state s ∈ *states*(C) can be reached from s almost surely.

model. However, special care has to be taken concerning total rewards collected within ECs of <sup>M</sup> that would no longer be present in the quotient <sup>M</sup>\MECS(M). We discuss how to deal with this issue by considering the quotient only for ECs in which no (total) reward is collected. We start with restricting the (total) rewards that might be assigned to transitions within EC.

Assumption 3 (Sign-Consistency) *For all total reward objectives* tot(R<sup>j</sup> ) *either* <sup>∀</sup> <sup>C</sup> <sup>∈</sup> MECS(M): <sup>R</sup><sup>j</sup> (C) <sup>≥</sup> <sup>0</sup> *or* <sup>∀</sup> <sup>C</sup> <sup>∈</sup> MECS(M): <sup>R</sup><sup>j</sup> (C) <sup>≤</sup> <sup>0</sup>*.*

The assumption implies that paths on which infinitely many positive *and* infinitely many negative reward is collected have probability 0. One consequence is that the limit in Definition 5 exists for almost all paths [3]. A discussion on objectives tot(R<sup>j</sup> ) that violate Assumption 3 for single-objective MDP is given in [3]. Their multi-objective treatment is left for future work.

When Assumptions <sup>1</sup> and <sup>3</sup> hold, we get <sup>R</sup><sup>j</sup> (C) <sup>≤</sup> <sup>0</sup> for all objectives tot(Ri) and EC C. Put differently, all non-zero total rewards collected in an EC have to be negative. Strategies that induce a total reward of −∞ for some objective tot(Ri) will not be taken into account for the set of achievable points. Therefore, transitions within ECs that yield negative reward should only be taken finitely often. These transitions can be disregarded when computing the expected LRA rewards, i.e., only the 0-ECs [3] are relevant for the LRA computation.

Definition 9. *A 0-EC of* <sup>M</sup> *and* <sup>R</sup>1,..., <sup>R</sup><sup>k</sup> *is an EC* <sup>C</sup> *of* <sup>M</sup> *with* <sup>R</sup>i(C)=0 *for all* <sup>R</sup>i*. The set of maximal 0-ECs is given by* MECS <sup>0</sup>(M,R1,..., <sup>R</sup>i)*.*

MECS <sup>0</sup>(M,R1,..., <sup>R</sup>k) can be computed by constructing the maximal ECs of the sub-MA of M where transitions with a non-zero reward are erased.

We are ready to describe our approach that combines LRA rewards of 0-ECs and the remaining total rewards into a single total-reward objective. Let Rtot **w** and Rlra **<sup>w</sup>** be reward assignments with Rtot **<sup>w</sup>** (s, κ, s ) = <sup>k</sup> <sup>i</sup>=1 **<sup>w</sup>**<sup>i</sup> ·Ri(s, κ, s ) and Rlra **<sup>w</sup>** (s, κ, s ) =  <sup>j</sup>=<sup>k</sup> **<sup>w</sup>**<sup>j</sup> · R<sup>j</sup> (s, κ, s ). Moreover, for <sup>π</sup> <sup>∈</sup> Pathsinf we set (tot(Rtot **<sup>w</sup>** ) + lra(Rlra **<sup>w</sup>** ))(π) = tot(Rtot **<sup>w</sup>** )(π) + lra(Rlra **<sup>w</sup>** )(π).

Theorem 3. <sup>∀</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> : **<sup>w</sup>** · Exσ(F) = Exσ(tot(Rtot **<sup>w</sup>** ) + lra(Rlra **<sup>w</sup>** ))*.*

*Proof.* Using a similar reasoning as in the proof of Theorem 2, we get:

$$\begin{split} \mathbf{w} \cdot \operatorname{Ex}\_{\sigma}(\mathcal{F}) &= \left( \sum\_{i=1}^{k} \mathbf{w} \left[ i \right] \cdot \operatorname{Ex}\_{\sigma}(\operatorname{tot}(\mathcal{R}\_{i})) \right) + \left( \sum\_{j=k+1}^{\ell} \mathbf{w} \left[ j \right] \cdot \operatorname{Ex}\_{\sigma}(\operatorname{lu}(\mathcal{R}\_{j})) \right) \\ &= \operatorname{Ex}\_{\sigma}(\operatorname{tot}(\mathcal{R}\_{\mathbf{w}}^{\operatorname{tot}})) + \operatorname{Ex}\_{\sigma}(\operatorname{lu}(\mathcal{R}\_{\mathbf{w}}^{\operatorname{lu}})) = \operatorname{Ex}\_{\sigma}(\operatorname{tot}(\mathcal{R}\_{\mathbf{w}}^{\operatorname{tot}}) + \operatorname{lu}(\mathcal{R}\_{\mathbf{w}}^{\operatorname{lu}})). \end{split}$$

Algorithm 2 outlines the procedure for solving the weighted sum optimization problem. It first computes optimal LRA rewards and inducing strategies for each maximal 0-EC (Lines 1 to 3). Then, a quotient model M<sup>∗</sup> and a reward assignment R<sup>∗</sup> incorporating all total- and LRA rewards is build and analyzed (Lines <sup>4</sup> to 6). <sup>M</sup><sup>∗</sup> might still contain ECs other than {s⊥}. Those ECs shall be left eventually to avoid collecting infinite negative reward for a total reward objective tot(Ri). Note that the weight **<sup>w</sup>**<sup>i</sup> for such an objective might be zero, Input : MA <sup>M</sup> with initial state <sup>s</sup>I , objectives

F = *tot*(R1),..., *tot*(Rk), *lra*(Rk+1),..., *lra*(R) , weight vector **w** Output : Value v**w**, strategy σ**<sup>w</sup>** as in the weighted sum optimization problem <sup>1</sup> C ← *MECS* <sup>0</sup>(M, R1,..., R<sup>i</sup> ) *// Compute maximal 0-ECs and their LRA.* <sup>2</sup> foreach C ∈ C do


$$\mathcal{R}^\*(\left\langle s,\kappa \right\rangle, s') = \begin{cases} \upsilon\_C & \text{if } s = C, \kappa = \perp, \text{ and } s' = s\_\perp \\ \mathcal{R}^{tot}\_\mathbf{w}(\left\langle s,\alpha \right\rangle, s') & \text{if } s = C, \kappa = \left\langle \right\rangle, \alpha \right) \in \text{axis}(C) \\ \mathcal{R}^{tot}\_\mathbf{w}(\left\langle s,\alpha \right\rangle, s') & \text{otherwise} \end{cases}$$

<sup>6</sup> Compute <sup>v</sup>**<sup>w</sup>** = max ExM<sup>∗</sup> <sup>σ</sup> (*tot*(R<sup>∗</sup>)) <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>M<sup>∗</sup> md , PrM<sup>∗</sup> <sup>σ</sup> (♦ {s⊥})=1 and inducing strategy <sup>σ</sup><sup>∗</sup> <sup>∈</sup> <sup>Σ</sup>M<sup>∗</sup> md

<sup>7</sup> Build strategy σ**<sup>w</sup>** ∈ Σ<sup>M</sup>md with

$$
\sigma\_{\mathbf{w}}(s) = \begin{cases}
\sigma\_C(s) & \text{if } \exists \, C \in \mathcal{C} \colon s \in states(C) \text{ and } \sigma^\*(C \in \mathcal{C}) = \bot \\
\alpha & \text{if } \exists \, C \in \mathcal{C} \colon s \in states(C) \text{ and } \sigma^\*(C) = \langle s, \alpha \rangle \\
\sigma\_{C \cap \{s'\}}(s) & \text{if } \exists \, C \in \mathcal{C} \colon s \in states(C) \text{ and } \sigma^\*(C) = \langle s', \alpha \rangle \text{ for } s' \neq s \\
\sigma^\*(s) & \text{otherwise}
\end{cases}
$$

Algorithm 2: Optimizing the weighted sum for total and LRA objectives

i.e., the rewards of R<sup>i</sup> are not present in R<sup>∗</sup>. It is therefore necessary to explicitly restrict the analysis to strategies that almost surely (i.e., with probability 1) reach <sup>s</sup>⊥. To compute the maximal expected total reward in Line <sup>6</sup> with, e.g., standard value iteration, we can consider another quotient model for M<sup>∗</sup> and the 0-ECs of M<sup>∗</sup> and R<sup>∗</sup>. In contrast to Definition 8, this quotient should not introduce the ⊥ action since it shall not be possible to remain in an EC forever. In Line 7, the strategies for the 0-ECs and for the quotient M<sup>∗</sup> are combined into one strategy <sup>σ</sup>**<sup>w</sup>** for <sup>M</sup>. Here, <sup>σ</sup>C♦s refers to a strategy of <sup>M</sup>-<sup>C</sup> for which every state <sup>s</sup> <sup>∈</sup> states(C) eventually reaches <sup>s</sup> <sup>∈</sup> states(C) almost surely.

Since Algorithm 2 produces a memoryless deterministic strategy σ**w**, the point **<sup>p</sup><sup>w</sup>** <sup>∈</sup> <sup>R</sup> in Line <sup>6</sup> of Algorithm <sup>1</sup> can be computed on the induced sub-MA for σ**w**. Assuming exact single-objective solution methods, the resulting value <sup>v</sup>**<sup>w</sup>** and strategy <sup>σ</sup>**<sup>w</sup>** <sup>∈</sup> <sup>Σ</sup><sup>M</sup> md of Algorithm <sup>2</sup> satisfy <sup>v</sup>**<sup>w</sup>** <sup>=</sup> **<sup>w</sup>** · Ex<sup>σ</sup>**<sup>w</sup>** (F), yielding an exact solution to the weighted sum optimization problem. As the number of memoryless deterministic strategies is bounded, we conclude the following, extending results for pure LRA queries [11] to mixtures with total rewards.

Corollary 1. *For total- and LRA reward objectives* F*,* Ach(F) *is closed and is the downward convex hull of at most* <sup>|</sup>Σ<sup>M</sup> md| = <sup>s</sup>∈PS <sup>|</sup>Δ(s)<sup>|</sup> *points.*

*Remark 3.* Our framework can be extended to support objectives beyond totaland LRA rewards. *Minimizing objectives* where one is interested in a strategy σ that induces a *small* expected value can be considered by multiplying all rewards with −1. Since we already allow negative values in reward assignments, no further adaptions are necessary. We emphasize that our framework lifts a restriction imposed in [28] that disabled a simultaneous analysis of maximizing *and* minimizing total reward objectives. *Reachability probabilities* can be transformed to expected total rewards on a modified model in which the information whether a goal state has already been visited is stored in the state-space. *Goal-bounded* total rewards as in [30], where no further rewards are collected as soon as one of the goal states is reached can be transformed similarly. For MDP, *step- and rewardbounded* reachability probabilities can be converted to total reward objectives by unfolding the current amount of steps (or rewards) into the state-space of the model. Approaches that avoid such an expensive unfolding have been presented in [28] for objectives with step-bounds and in [34,35] for objectives with one or multiple reward-bounds. *Time-bounded* reachability probabilities for MA have been considered in [47]. Finally, ω-regular specifications such as *linear temporal logic (LTL)* formulae have been transformed to total reward objectives in [27]. However, the optimization of LRA rewards within the ECs of the model might interfere with the satisfaction of one or more ω-regular specifications [31].

### 5 Experimental Evaluation

*Implementation details* Our approach has been implemented in the model checker Storm [40]. Given an MA or MDP (specified using the PRISM language or JANI [14]), the tool answers qualitative- and quantitative achievability as well as Pareto queries. Beside of mixtures of total- and LRA reward objectives, Storm also supports most of the extensions in Remark 3—with the notable exception of LTL. We use LRA value iteration [17,1] and sound value iteration [48] for calls to single-objective model checking. Both provide sound precision guarantees, i.e., the relative error of these computations is at most ε, where we set ε = 10−6.

*Workstation cluster* To showcase the capabilities of our implementation, we present a workstation cluster—originally considered in [39] as a CTMC—now modeled as an MA. The cluster considers two sub-clusters each consisting of one *switch* and N *workstations*. Within each sub-cluster the workstations are connected to the switch in a star topology and the two switches are connected with a *backbone*. Each of the components may fail with a certain rate. A controller can (i) acquire additional repair units (up to M) and (ii) control the movements of the repair units. In Fig. 2a we depict the resulting sets of achievable points—as computed by our implementation—for N = 16 and M = 4. As objectives, we considered the long-run average number of operating workstations lra(R#op), the long-run average probability that at least N workstations are operational lra(R#op≥<sup>N</sup> ), and the total number of acquired repair units tot(R#rep).

*Related tools* MultiGain [12] is an extension of PRISM [45] that implements the LP-based approach of [11] for multiple LRA objectives on MDP to answer

(a) Results for workstation cluster (b) Comparison of Storm and M ltiGain

Figure 2: Exemplary results and runtime comparison with MultiGain

qualitative and quantitative achievability as well as Pareto queries. For the latter, it is briefly mentioned in [12] that ideas of [28] were used similar to our approach but no further details are provided. MultiGain does not support MA, *mixtures* with total reward objectives, and Pareto queries with > 2 objectives. However, it does support more general quantitative achievability queries.

PRISM-games [44,43] implements value iteration over convex sets [8,9] to analyze multiple LRA reward objectives on stochastic games (SGs). By converting MDPs to 1-player SGs, PRISM-games could also be applied in our setting. However, some experiments on 1-player SGs indicated that this approach is not competitive compared to the dedicated MDP implementations in MultiGain and Storm. We therefore do not consider PRISM-games in our evaluation.

*Benchmarks* We consider 10 different case studies including the workstation cluster (clu) as well as benchmarks from QVBS [37] (dpm, rqs, res), from Multi-Gain [12] (mut, phi, vir), from [42] (csn, sen), and from [47] (pol). For each case study we consider 3 concrete instances resulting in 12 MAs and 18 MDPs. The analyzed objectives range over LRA rewards, (goal-bounded) total rewards, and time-, step- and unbounded reachability probabilities.

*Set-up* We evaluated the performance of Storm and MultiGain Version 1.0.2<sup>3</sup>. All experiments were run on 4 cores<sup>4</sup> of an Intel Xeon Platinum 8160 CPU with

<sup>3</sup> Obtained from http://qav.cs.ox.ac.uk/multigain and invoked with G robi [32].

<sup>4</sup> Storm uses one core, M ltiGain uses multiple cores due to Java's garbage collection and G robi's parallel solving techniques.


Table 1: Results for pure LRA Pareto queries

a time limit of 2 hours and 32 GB RAM. For each experiment we measured the total runtime (including model building) to solve one query. For qualitative and quantitative achievability we consider thresholds close to the Pareto front. For Pareto queries, the approximation precision 10−<sup>4</sup> was set to both tools.

*Results* Fig. 2b visualizes the runtime comparison with MultiGain. A point x, y in the plot corresponds to a query that has been solved by Storm in <sup>x</sup> seconds and by MultiGain in y seconds. Points on the solid diagonal mean that both tools were equally fast. The two dotted lines indicate experiments where Storm only required <sup>1</sup> <sup>10</sup> resp. <sup>1</sup> <sup>100</sup> of the time of MultiGain. TO and MO indicate a time- or memory out. Tables 1 and 2 provide further data for Pareto queries. The columns indicate model name and parameters, the number of LRA reward, total reward, and bounded reachability objectives, the number of states (|S|), Markovian states (|MS <sup>|</sup>), successor distributions (|Δ|), 0-ECs (|C|), and states within 0-ECs (|SC|) of the MA or MDP, the number of iterations (#iters) of Algorithm 1 performed by Storm, and the total runtime of Storm and MultiGain in seconds. Runtimes are omitted if the tool does not support the query. MDP (MA) benchmarks are at the top (bottom) of each table. Table 1 considers pure LRA queries, whereas Table 2 considers mixtures.


Table 2: Results for Pareto queries with other objective types

*Discussion* As indicated in Fig. 2b, our implementation outperforms MultiGain on almost all benchmarks and for all types of queries and is often orders of magnitude faster. According to MultiGain's log files, the majority of its runtime is spend for solving LPs, suggesting that the better performance of Storm is likely due to the iterative approach presented in this work.

Table 1 shows that *pure LRA queries on models with millions of states can be handled*. There were no significant runtime gaps between MA and MDP models. For csn, the increased number of objectives drastically increases the overall runtime. This is partly due to our naive implementation of the geometric set representations used in Algorithm 1. Table 2 indicates that the performance and scalability for mixtures of LRA and other types of objectives is similar. One exception are queries involving time-bounded reachability on MA (e.g., dpm). Here, our implementation is based on the single-objective approach of [29] that is known to be slower than more recent methods [16,15].

*Data availability* The implementation, models, and log files are available at [49].

### 6 Conclusion

The analysis of multi-objective model checking queries involving multiple longrun average rewards can be incorporated into the framework of [28] enabling (i) the use of off-the-shelf single-objective algorithms for LRA and (ii) the combination with other kinds of objectives such as total rewards. Our experiments indicate that this approach clearly outperforms existing algorithms based on linear programming. Future work includes lifting the approach to *partially observable MDP* and *stochastic games*, potentially using ideas of [10] and [2], respectively.

### References


pareto curves. ACM Trans. Model. Comput. Simul. 29(4), 27:1–27:31 (2019). https://doi.org/10.1145/3309683


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **tifInferring Expected Runtimes of Probabilistic Integer Programs Using Expected Sizes***-*

Fabian Meyer , Marcel Hark(-) , and J¨urgen Giesl(-) **Consistent \* Complete \* Well Documen et d**

\* TACAS \* **Ar**

**t ysaE \***

**act** \* AEC

\* **Eva**

**l**

**o Reuse \***

**udeta**LuFG Informatik 2, RWTH Aachen University, Aachen, Germany fabian.niklas.meyer@rwth-aachen.de, {marcel.hark,giesl}@cs.rwth-aachen.de

**Abstract.** We present a novel modular approach to infer upper bounds on the expected runtimes of probabilistic integer programs automatically. To this end, it computes bounds on the runtimes of program parts and on the sizes of their variables in an alternating way. To evaluate its power, we implemented our approach in a new version of our open-source tool KoAT.

### **1 Introduction**

There exist several approaches and tools for automatic complexity analysis of non-probabilistic programs, e.g., [2–6, 8, 9, 18, 20, 21, 27, 28, 30, 34–36, 51, 57, 58]. While most of them rely on basic techniques like ranking functions (see, e.g., [6, 12–14, 17, 53]), they usually combine these basic techniques in sophisticated ways. For example, in [18] we developed a modular approach for automated complexity analysis of integer programs, based on an alternation between finding symbolic runtime bounds for program parts and using them to infer bounds on the sizes of variables in such parts. So each analysis step is restricted to a small part of the program. The implementation of this approach in KoAT [18] (which is integrated in AProVE [30]) is one of the leading tools for complexity analysis [31].

While there exist several adaptions of basic techniques like ranking functions to probabilistic programs (e.g., [1, 11, 15, 16, 22–26, 29, 32, 37, 38, 48, 62]), most of the sophisticated full approaches for complexity analysis have not been adapted to probabilistic programs yet, and there are only few powerful tools available which analyze the runtimes of probabilistic programs automatically [10,50,61,62].

We study probabilistic integer programs (Sect. 2) and define suitable notions of non-probabilistic and expected runtime and size bounds (Sect. 3). Then, we adapt our modular approach for runtime and size analysis of [18] to probabilistic programs (Sect. 4 and 5). So such an adaption is not only possible for basic techniques like ranking functions, but also for full approaches for complexity analysis.

For this adaption, several problems had to be solved. When computing expected runtime or size bounds for new program parts, the main difficulty is to determine when it is sound to use expected bounds on previous program parts and when one has to use non-probabilistic bounds instead. Moreover, the semantics of probabilistic programs is significantly different from classical integer programs. Thus, the proofs of our techniques differ substantially from the ones in [18], e.g.,

funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 235950644 (Project GI 274/6-2) & DFG Research Training Group 2236 UnRAVeL

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 250–269, 2021.

https://doi.org/10.1007/978-3-030-72016-2 14

we have to use concepts from measure theory like ranking supermartingales.

In Sect. 6, we evaluate the implementation of our new approach in the tool KoAT [18, 43] and compare with related work. We refer to [47] for an appendix of our paper containing all proofs, preliminaries from probability and measure theory, and an overview on the benchmark collection used in our evaluation.

### **2 Probabilistic Integer Programs**

For any set <sup>M</sup> <sup>⊆</sup> <sup>R</sup> (with <sup>R</sup> <sup>=</sup> <sup>R</sup> ∪ {∞}) and <sup>w</sup> <sup>∈</sup> <sup>M</sup>, let <sup>M</sup>≥<sup>w</sup> <sup>=</sup> {<sup>v</sup> <sup>∈</sup> <sup>M</sup> <sup>|</sup> <sup>v</sup> <sup>≥</sup> <sup>w</sup> <sup>∨</sup> <sup>v</sup> <sup>=</sup> ∞}. For a set PV of program variables, we first introduce the kind of bounds that our approach computes. Similar to [18], our bounds represent weakly monotonically increasing functions from PV → <sup>R</sup>≥<sup>0</sup>. Such bounds have the advantage that they can easily be "composed", i.e., if f and g are both weakly monotonically increasing upper bounds, then so is <sup>f</sup> ◦ <sup>g</sup>.

**Definition 1 (Bounds).** The set of bounds <sup>B</sup> is the smallest set with PV∪R≥<sup>0</sup> ⊆ B, and where <sup>b</sup>1, b<sup>2</sup> ∈ B and <sup>v</sup> <sup>∈</sup> <sup>R</sup>≥<sup>1</sup> imply <sup>b</sup><sup>1</sup> <sup>+</sup> <sup>b</sup>2, b<sup>1</sup> · <sup>b</sup><sup>2</sup> ∈ B and <sup>v</sup><sup>b</sup><sup>1</sup> ∈ B.

Our notion of probabilistic programs combines classical integer programs (as in, e.g., [18]) and probabilistic control flow graphs (see, e.g., [1]). A state s is a variable assignment <sup>s</sup>: V → <sup>Z</sup> for the (finite) set <sup>V</sup> of all variables in the program, where PV ⊆ V, V \ PV is the set of temporary variables, and <sup>Σ</sup> is the set of all states. For any <sup>s</sup> <sup>∈</sup> <sup>Σ</sup>, the state <sup>|</sup>s<sup>|</sup> is defined by <sup>|</sup>s|(x) = <sup>|</sup>s(x)<sup>|</sup> for all <sup>x</sup> ∈ V. The set <sup>C</sup> of constraints is the smallest set containing <sup>e</sup><sup>1</sup> <sup>≤</sup> <sup>e</sup><sup>2</sup> for all polynomials <sup>e</sup>1, e<sup>2</sup> <sup>∈</sup> <sup>Z</sup>[V] and <sup>c</sup><sup>1</sup> <sup>∧</sup> <sup>c</sup><sup>2</sup> for all <sup>c</sup>1, c<sup>2</sup> ∈ C. In addition to "≤", in examples we also use relations like ">", which can be simulated by constraints (e.g., e<sup>1</sup> > e<sup>2</sup> is equivalent to <sup>e</sup><sup>2</sup> + 1 <sup>≤</sup> <sup>e</sup><sup>1</sup> when regarding integers). We also allow the application of states to arithmetic expressions e and constraints c. Then the number s(e) resp. <sup>s</sup>(c) ∈ {**t**,**f**} results from evaluating the expression resp. the constraint when substituting every variable <sup>x</sup> by <sup>s</sup>(x). So for bounds <sup>b</sup> ∈ B, we have <sup>|</sup>s|(b) <sup>∈</sup> <sup>R</sup>≥0.

In the transitions of a program, a program variable <sup>x</sup> ∈ PV can also be updated by adding a value according to a bounded distribution function <sup>d</sup> : <sup>Σ</sup> <sup>→</sup> Dist(Z). Here, for any state s, d(s) is the probability distribution of the values that are added to <sup>x</sup>. As usual, a probability distribution on <sup>Z</sup> is a mapping pr : <sup>Z</sup> <sup>→</sup> <sup>R</sup> with pr(v) <sup>∈</sup> [0, 1] for all <sup>v</sup> <sup>∈</sup> <sup>Z</sup> and <sup>v</sup>∈<sup>Z</sup> pr(v) = 1. Let Dist(Z) be the set of distributions pr whose expected value E(pr) = <sup>v</sup>∈<sup>Z</sup> <sup>v</sup> · pr(v) is well defined and finite, i.e., Eabs(pr) = <sup>v</sup>∈<sup>Z</sup> <sup>|</sup>v| · pr(v) <sup>&</sup>lt; <sup>∞</sup>. A distribution function <sup>d</sup> : <sup>Σ</sup> <sup>→</sup> Dist(Z) is bounded if there is a finite bound <sup>E</sup>(d) ∈ B with <sup>E</sup>abs(d(s)) ≤ |s|(E(d)) for all <sup>s</sup> <sup>∈</sup> <sup>Σ</sup>. Let <sup>D</sup> denote the set of all bounded distribution functions (our implementation supports Bernoulli, uniform, geometric, hypergeometric, and binomial distributions, see [43] for details).

**Definition 2 (PIP).** (PV,L, GT , 0) is a probabilistic integer program with


$$\begin{array}{ll} p = \frac{1}{2} & \eta(x) = x - 1 \\ \tau = (x > 0) & \eta(y) = y + x \\ \eta(x) = x & \underbrace{\begin{array}{l} t\_1 \in g\_1 \\ \eta(y) = y \end{array}}\_{t\_0 \in g\_0} & \underbrace{\eta(x) = x}\_{y} & \eta(y) = y - 1 \\ \underbrace{\eta(y) = y}\_{y} & \underbrace{\eta(y) = y}\_{t\_3 \in g\_2} & \underbrace{\tau = (y > 0)}\_{t\_2 \in g\_3} \\ p = \frac{1}{2} & \underbrace{\bigcirc\_{t\_2 \in g\_1}^2}\_{\eta(x) = x} & \eta(x) = x \\ \tau = (x > 0) & \eta(y) = y + x \end{array}$$

Fig. 1: PIP with non-deterministic and probabilistic branching


PIPs allow for both probabilistic and non-deterministic branching and sampling. Probabilistic branching is modeled by selecting a transition out of a non-singleton general transition. Non-deterministic branching is represented by several general transitions with the same start location and non-exclusive guards. Probabilistic sampling is realized by update functions that map a program variable to a bounded distribution function. Non-deterministic sampling is modeled by updating a program variable with an expression containing temporary variables from V \ PV, whose values are non-deterministic (but can be restricted in the guard). The set of initial general transitions GT<sup>0</sup> ⊆ GT consists of all general transitions with start location 0.

Example 3 (PIP). *Consider the PIP in Fig. 1 with initial location* <sup>0</sup> *and the program variables* PV <sup>=</sup> {x, y}*. Here, let* <sup>p</sup> = 1 *and* <sup>τ</sup> <sup>=</sup> **<sup>t</sup>** *if not stated explicitly. There are four general transitions:* <sup>g</sup><sup>0</sup> <sup>=</sup> {t0}*,* <sup>g</sup><sup>1</sup> <sup>=</sup> {t1, t2}*,* <sup>g</sup><sup>2</sup> <sup>=</sup> {t3}*, and* <sup>g</sup><sup>3</sup> <sup>=</sup> {t4}*, where* <sup>g</sup><sup>1</sup> *and* <sup>g</sup><sup>2</sup> *represent a non-deterministic branching. When choosing the general transition* g1*, the transitions* t<sup>1</sup> *and* t<sup>2</sup> *encode a probabilistic branching. If we modified the update* <sup>η</sup> *and the guard* <sup>τ</sup> *of* <sup>t</sup><sup>0</sup> *to* <sup>η</sup>(x) = <sup>u</sup> ∈ V\PV *and* τ = (u > 0)*, then* x *would be updated to a non-deterministically chosen positive value. In contrast, if* η(x) = GEO( <sup>1</sup> <sup>2</sup> )*, then* <sup>t</sup><sup>0</sup> *would update* <sup>x</sup> *by adding a value sampled from the geometric distribution with parameter* <sup>1</sup> 2 *.*

In the following, we regard a fixed PIP P as in Def. 2. A configuration is a tuple (, t, s), with the current location ∈ L, the current state <sup>s</sup> <sup>∈</sup> <sup>Σ</sup>, and the transition <sup>t</sup> that was evaluated last and led to the current configuration. Let <sup>T</sup> <sup>=</sup> <sup>g</sup>∈GT <sup>g</sup>. Then Conf = (L{⊥})×(T {tin, t⊥})×<sup>Σ</sup> is the set of all configurations, with a special location <sup>⊥</sup> indicating the termination of a run, and special transitions <sup>t</sup>in (used in the first configuration of a run) and <sup>t</sup><sup>⊥</sup> (for the configurations of the run

after termination). The (virtual) general transition <sup>g</sup><sup>⊥</sup> <sup>=</sup> {t⊥} only contains <sup>t</sup>⊥. <sup>A</sup> run of a PIP is an infinite sequence <sup>ϑ</sup> <sup>=</sup> <sup>c</sup><sup>0</sup> <sup>c</sup><sup>1</sup> ···∈ Confω. Let Runs <sup>=</sup> Conf<sup>ω</sup> and let FPath = Conf<sup>∗</sup> be the set of all finite paths of configurations.

In our setting, deterministic Markovian schedulers suffice to resolve all nondeterminism (see, e.g., [54, Prop. 6.2.1]). For <sup>c</sup> = (, t, s) <sup>∈</sup> Conf, such a scheduler S yields a pair S(c)=(g, s ) where g is the next general transition to be taken (with = g) and s chooses values for the temporary variables where s (τg) = **t** and s(x) = s (x) for all <sup>x</sup> ∈ PV. If GT contains no such <sup>g</sup>, we get <sup>S</sup>(c)=(g⊥, s).

For each scheduler S and initial state s0, we first define a probability mass function prS,s<sup>0</sup> . For all <sup>c</sup> <sup>∈</sup> Conf, prS,s<sup>0</sup> (c) is the probability that a run starts in c. Thus, prS,s<sup>0</sup> (c) = 1 if c = (0, tin, s0) and prS,s<sup>0</sup> (c) = 0, otherwise. Moreover, for all c , c <sup>∈</sup> Conf, prS,s<sup>0</sup> (c <sup>→</sup> <sup>c</sup>) is the probability that the configuration <sup>c</sup> is followed by the configuration c (see [47] for the formal definition of prS,s<sup>0</sup> ).

For any <sup>f</sup> <sup>=</sup> <sup>c</sup><sup>0</sup> ··· <sup>c</sup><sup>n</sup> <sup>∈</sup> FPath, let prS,s<sup>0</sup> (f) = prS,s<sup>0</sup> (c0) · prS,s<sup>0</sup> (c<sup>0</sup> <sup>→</sup> <sup>c</sup>1) · ... · prS,s<sup>0</sup> (c<sup>n</sup>−<sup>1</sup> <sup>→</sup> <sup>c</sup>n). We say that <sup>f</sup> is admissible for <sup>S</sup> and <sup>s</sup><sup>0</sup> if prS,s<sup>0</sup> (f) <sup>&</sup>gt; 0. A run ϑ is admissible if all its finite prefixes are admissible. A configuration <sup>c</sup> <sup>∈</sup> Conf is admissible if there is some admissible finite path ending in <sup>c</sup>.

The semantics of PIPs can now be defined by giving a corresponding probability space, which is obtained by a standard cylinder construction (see, e.g., [7,60]). Let PS,s<sup>0</sup> denote the corresponding probability measure which lifts prS,s<sup>0</sup> to cylinder sets: For any <sup>f</sup> <sup>∈</sup> FPath, we have prS,s<sup>0</sup> (f) = <sup>P</sup>S,s<sup>0</sup> (Pre<sup>f</sup> ) for the set Pre<sup>f</sup> of all runs with prefix f. So PS,s<sup>0</sup> (Θ) is the probability that a run from <sup>Θ</sup> <sup>⊆</sup> Runs is obtained when using the scheduler <sup>S</sup> and starting in <sup>s</sup>0.

We denote the associated expected value operator by ES,s<sup>0</sup> . So for any random variable <sup>X</sup> : Runs <sup>→</sup> <sup>N</sup> <sup>=</sup> <sup>N</sup> ∪ {∞}, we have <sup>E</sup>S,s<sup>0</sup> (X) = <sup>n</sup>∈<sup>N</sup> <sup>n</sup> · <sup>P</sup>S,s<sup>0</sup> (<sup>X</sup> <sup>=</sup> <sup>n</sup>). For details on the preliminaries from probability theory we refer to [47].

### **3 Complexity Bounds**

In Sect. 3.1, we first recapitulate the concepts of (non-probabilistic) runtime and size bounds from [18]. Then we introduce expected runtime and size bounds in Sect. 3.2 and connect them to their non-probabilistic counterparts.

#### **3.1 Runtime and Size Bounds**

Again, let P denote the PIP which we want to analyze. Def. 4 recapitulates the notions of runtime and size bounds from [18] in our setting. Recall that bounds from B do not contain temporary variables, i.e., we always try to infer bounds in terms of the initial values of the program variables. Let sup ∅ = 0, as all occurring sets are subsets of <sup>R</sup>≥<sup>0</sup>, whose minimal element is 0.

**Definition 4 (Runtime and Size Bounds** [18]**).** RB: T →B is a runtime bound and SB: T ×V → B is a size bound if for all transitions <sup>t</sup> ∈ T , all variables <sup>x</sup> ∈ V, all schedulers <sup>S</sup>, and all states <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>Σ</sup>, we have

$$\begin{array}{llll} \left| s\_0 \right| \left( \mathcal{R} \mathcal{B}(t) \right) & \geq \sup \left\{ \left| \{ i \mid t\_i = t \} \right| \mid f = (\,,, t\_0, \Box) \cdot \left( \Box, t\_n, \Box \right) \land pr\_{\mathfrak{S}, s\_0}(f) > 0 \right\}, \\ \left| s\_0 \right| \left( \mathcal{B} \mathcal{B}(t, x) \right) & \geq \sup \left\{ \left| s(x) \right| \; \qquad \left| \; f = \cdots \left( \Box, t, s \right) \land pr\_{\mathfrak{S}, s\_0}(f) > 0 \right\}. \end{array}$$

So RB(t) is a bound on the number of executions of <sup>t</sup> and SB(t, x) overapproximates the greatest absolute value that <sup>x</sup> ∈ V takes after the application of the transition t in any admissible finite path. Note that Def. 4 does not apply to <sup>t</sup>in and <sup>t</sup>⊥, since they are not contained in <sup>T</sup> .

We call a tuple (RB, SB) a (non-probabilistic) bound pair. We will use such non-probabilistic bound pairs for an initialization of expected bounds (Thm. 10) and to compute improved expected runtime and size bounds in Sect. 4 and 5.

Example 5 (Bound Pair). *The technique of [18] computes the following bound pair for the PIP of Fig. 1 (by ignoring the probabilities of the transitions).*

$$\mathcal{TRB}(t) = \begin{cases} 1, & \text{if } t = t\_0 \text{ or } t = t\_3 \\ x, & \text{if } t = t\_1 \\ \infty, & \text{if } t = t\_2 \text{ or } t = t\_4 \end{cases} \qquad \mathcal{SB}(t, x) = \begin{cases} x, & \text{if } t \in \{t\_0, t\_1, t\_2\} \\ 3 \cdot x, & \text{if } t \in \{t\_3, t\_4\} \\ \end{cases}$$

*Clearly,* t<sup>0</sup> *and* t<sup>3</sup> *can only be evaluated once. Since* t<sup>1</sup> *decrements* x *and no transition increments it,* <sup>t</sup>1*'s runtime is bounded by* <sup>|</sup>s0|(x)*. However,* <sup>t</sup><sup>2</sup> *can be executed arbitrarily often if* s0(x) > 0*. Thus, the runtimes of* t<sup>2</sup> *and* t<sup>4</sup> *are unbounded (i.e.,* P *is not terminating when regarding it as a non-probabilistic program).* SB(t, x) *is finite for all transitions* <sup>t</sup>*, since* <sup>x</sup> *is never increased. In contrast, the value of* y *can be arbitrarily large after all transitions but* t0*.*

### **3.2 Expected Runtime and Size Bounds**

We now define the expected runtime and size complexity of a PIP P.

**Definition 6 (Expected Runtime Complexity, PAST** [15]**).** For <sup>g</sup> ∈ GT , its runtime is the random variable <sup>R</sup>(g) where <sup>R</sup>: GT → Runs <sup>→</sup> <sup>N</sup> with

<sup>R</sup>(g)( ( , t0, ) ( , t1, ) ···) = | {<sup>i</sup> <sup>|</sup> <sup>t</sup><sup>i</sup> <sup>∈</sup> <sup>g</sup>} | .

For a scheduler <sup>S</sup> and <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>Σ</sup>, the expected runtime complexity of <sup>g</sup> ∈ GT is <sup>E</sup><sup>S</sup>,s<sup>0</sup> (R(g)) and the expected runtime complexity of <sup>P</sup> is <sup>g</sup>∈GT <sup>E</sup><sup>S</sup>,s<sup>0</sup> (R(g)).

If P's expected runtime complexity is finite for every scheduler S and every initial state <sup>s</sup>0, then <sup>P</sup> is called positively almost surely terminating (PAST).

So <sup>R</sup>(g)(ϑ) is the number of executions of a transition from <sup>g</sup> in the run <sup>ϑ</sup>.

While non-probabilistic size bounds refer to pairs (t, x) of transitions <sup>t</sup> ∈ T and variables <sup>x</sup> ∈ V (so-called result variables in [18]), we now introduce expected size bounds for general result variables (g, , x), which consist of a general transition <sup>g</sup>, one of its target locations , and a program variable <sup>x</sup> ∈ PV. So <sup>x</sup> must not be a temporary variable (which represents non-probabilistic non-determinism), since general result variables are used for expected size bounds.

**Definition 7 (Expected Size Complexity).** The set of general result variables is GRV <sup>=</sup> { (g, , x) <sup>|</sup> <sup>g</sup> ∈ GT , x ∈ PV,( , , , , ) <sup>∈</sup> <sup>g</sup> }. The size of <sup>α</sup> <sup>=</sup> (g, , x) ∈ GRV is the random variable <sup>S</sup>(α) where <sup>S</sup> : GRV → Runs <sup>→</sup> <sup>N</sup> with

$$\mathcal{S}(g,\ell,x)\left(\left(\ell\_0,t\_0,s\_0\right)\left(\ell\_1,t\_1,s\_1\right)\cdots\right) = \sup\left\{ \left|s\_i(x)\right| \: \mid \: \ell\_i = \ell \land t\_i \in g \right\} \dots$$

For a scheduler <sup>S</sup> and <sup>s</sup>0, the expected size complexity of <sup>α</sup>∈GRV is <sup>E</sup>S,s<sup>0</sup> (S(α)).

So for any run <sup>ϑ</sup>, <sup>S</sup>(g, , x)(ϑ) is the greatest absolute value of <sup>x</sup> in location , whenever was entered with a transition from g. We now define bounds for the expected runtime and size complexity which hold independent of the scheduler.

#### **Definition 8 (Expected Runtime and Size Bounds).**


Example 9 (Expected Runtime and Size Bounds). *Our new techniques from Sect. 4 and 5 will derive the following expected bounds for the PIP from Fig. 1.*

$$\begin{aligned} \mathcal{R}\mathcal{B}\_{\mathbb{E}}(g) &= \begin{cases} 1, & \text{if } g \in \{g\_{0}, g\_{2}\} \\ 2 \cdot x, & \text{if } g = g\_{1} \\ 6 \cdot x^{2} + 2 \cdot y, & \text{if } g = g\_{3} \end{cases} & \mathcal{B}\mathcal{B}\_{\mathbb{E}}(g, \cdot, x) = \begin{cases} x, & \text{if } g = g\_{0} \\ 2 \cdot x, & \text{if } g = g\_{1} \\ 3 \cdot x, & \text{if } g \in \{g\_{2}, g\_{3}\} \end{cases} \\ \mathcal{B}\mathcal{B}\_{\mathbb{E}}(g\_{0}, \ell\_{1}, y) &= y & \mathcal{B}\mathcal{B}\_{\mathbb{E}}(g\_{2}, \ell\_{2}, y) = 6 \cdot x^{2} + 2 \cdot y \\ \mathcal{B}\mathcal{B}\_{\mathbb{E}}(g\_{1}, \ell\_{1}, y) &= 6 \cdot x^{2} + y & \mathcal{B}\mathcal{B}\_{\mathbb{E}}(g\_{3}, \ell\_{2}, y) = 12 \cdot x^{2} + 4 \cdot y \end{aligned}$$

*While the runtimes of* t<sup>2</sup> *and* t<sup>4</sup> *were unbounded in the non-probabilistic case (Ex. 5), we obtain finite bounds on the expected runtimes of* <sup>g</sup><sup>1</sup> <sup>=</sup> {t1, t2} *and* <sup>g</sup><sup>3</sup> <sup>=</sup> {t4}*. For example, we can expect* <sup>x</sup> *to be non-positive after at most* <sup>|</sup>s0|(2·x) *iterations of* g1*. Based on the above expected runtime bounds, the expected runtime complexity of the PIP is at most* <sup>|</sup>s0|(RB<sup>E</sup>(g0) + ... <sup>+</sup> RB<sup>E</sup>(g3)) = <sup>|</sup>s0|(2 + 2 · <sup>x</sup> + 2 · <sup>y</sup> + 6 · <sup>x</sup>2)*, i.e., it is in* <sup>O</sup>(n2) *where* <sup>n</sup> *is the maximal absolute value of the program variables at the start of the program.*

The following theorem shows that non-probabilistic bounds can be lifted to expected bounds, since they do not only bound the expected value of <sup>R</sup>(g) resp. <sup>S</sup>(α), but the whole distribution. As mentioned, all proofs can be found in [47].

**Theorem 10 (Lifting Bounds).** For a bound pair (RB, SB), (RB<sup>E</sup>, SB<sup>E</sup>) with RB<sup>E</sup>(g) = <sup>t</sup>∈<sup>g</sup> RB(t) and SB<sup>E</sup>(g, , x) = <sup>t</sup>=( , , , ,)∈<sup>g</sup> SB(t, x) is an expected bound pair.

Here, we over-approximate the maximum of SB(t, x) for <sup>t</sup> = ( , , , , ) <sup>∈</sup> <sup>g</sup> by their sum. For asymptotic bounds, this does not affect precision, since max(f,g) and f + g have the same asymptotic growth for any non-negative functions f,g.

Example 11 (Lifting of Bounds). *When lifting the bound pair of Ex. 5 to expected bounds according to Thm. 10, one would obtain* RB<sup>E</sup>(g0) = RB<sup>E</sup>(g2)=1 *and* RB<sup>E</sup>(g1) = RB<sup>E</sup>(g3) = <sup>∞</sup>*. Moreover,* SB<sup>E</sup>(g0, 1, x) = <sup>x</sup>*,* SB<sup>E</sup>(g1, 1, x)=2 · <sup>x</sup>*,* SB<sup>E</sup>(g2, 2, x) = SB<sup>E</sup>(g3, 2, x)=3 · <sup>x</sup>*,* SB<sup>E</sup>(g0, 1, y) = <sup>y</sup>*, and* SB<sup>E</sup>(g, , y) = <sup>∞</sup> *whenever* <sup>g</sup> <sup>=</sup> <sup>g</sup>0*. Thus, with these lifted bounds one cannot show that* <sup>P</sup>*'s expected runtime complexity is finite, i.e., they are substantially less precise than the finite expected bounds from Ex. 9. Our approach will compute such finite expected bounds by repeatedly improving the lifted bounds of Thm. 10.*

### **4 Computing Expected Runtime Bounds**

We first present a new variant of probabilistic linear ranking functions in Sect. 4.1. Based on this, in Sect. 4.2 we introduce our modular technique to infer expected runtime bounds by using expected size bounds.

#### **4.1 Probabilistic Linear Ranking Functions**

For probabilistic programs, several techniques based on ranking supermartingales have been developed. In this section, we define a class of probabilistic ranking functions that will be suitable for our modular analysis.

We restrict ourselves to ranking functions <sup>r</sup> : L → <sup>R</sup>[PV] lin that map every location to a linear polynomial (i.e., of at most degree 1) without temporary variables. The linearity restriction is common to ease the automated inference of ranking functions. Moreover, this restriction will be needed for the soundness of our technique. Nevertheless, our approach of course also infers non-linear expected runtimes (by combining the linear bounds obtained for different program parts).

Let expr,g,s denote the expected value of <sup>r</sup> after an execution of <sup>g</sup> ∈ GT in state <sup>s</sup> <sup>∈</sup> <sup>Σ</sup>. Here, <sup>s</sup>η(x) is the expected value of <sup>x</sup> ∈ PV after performing the update <sup>η</sup> in state <sup>s</sup>. So if <sup>η</sup>(x) ∈ D, then <sup>x</sup>'s expected value after the update results from adding the expected value of the probability distribution η(x)(s):

$$\exp\_{\mathfrak{r},g,s} = \sum\_{\{\ell,p,\tau,\eta,\ell'\} \in g} p \cdot s\_\eta(\mathfrak{r}(\ell')) \text{ with } s\_\eta(x) = \begin{cases} s(\eta(x)), & \text{if } \eta(x) \in \mathbb{Z}[\mathcal{V}] \\ s(x) + \mathbb{E}(\eta(x)(s)), & \text{if } \eta(x) \in \mathcal{D} \end{cases}$$

**Definition 12 (PLRF).** Let GT<sup>&</sup>gt; ⊆ GTni ⊆ GT . Then <sup>r</sup>: L → <sup>R</sup>[PV] lin is a probabilistic linear ranking function (PLRF) for GT<sup>&</sup>gt; and GTni if for all <sup>g</sup> <sup>∈</sup> GTni \ GT<sup>&</sup>gt; and <sup>c</sup> <sup>∈</sup> Conf there is a g,c- ∈ {<, ≥} such that for all finite paths ··· <sup>c</sup> <sup>c</sup> that are admissible for some <sup>S</sup> and <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>Σ</sup>, and where <sup>c</sup> = (, t, s) (i.e., where t is the transition that is used in the step from c to c), we have: *Boundedness (a):* If <sup>t</sup> <sup>∈</sup> <sup>g</sup> for a <sup>g</sup> ∈ GTni \ GT>, then <sup>s</sup>(r()) g,c-0.

*Boundedness (b):* If <sup>t</sup> <sup>∈</sup> <sup>g</sup> for a <sup>g</sup> ∈ GT>, then <sup>s</sup>(r()) <sup>≥</sup> <sup>0</sup>.

*Non-Increase:* If <sup>=</sup> <sup>g</sup> for a <sup>g</sup> ∈ GTni and <sup>s</sup>(τg) = **<sup>t</sup>**, then <sup>s</sup>(r()) <sup>≥</sup> exp<sup>r</sup>,g,s. *Decrease:* If <sup>=</sup> <sup>g</sup> for a <sup>g</sup> ∈ GT<sup>&</sup>gt; and <sup>s</sup>(τg) = **<sup>t</sup>**, then <sup>s</sup>(r()) <sup>−</sup> <sup>1</sup> <sup>≥</sup> exp<sup>r</sup>,g,s.

So if one is restricted to the sub-program with the non-increasing transitions GTni, then <sup>r</sup>() is an upper bound on the expected number of applications of transitions from GT<sup>&</sup>gt; when starting in . Hence, a PLRF for GT<sup>&</sup>gt; <sup>=</sup> GTni <sup>=</sup> GT would imply that the program is PAST (see, e.g., [1, 16, 24, 25]). However, our PLRFs differ from the standard notion of probabilistic ranking functions by considering arbitrary subsets GTni ⊆ GT . This is needed for the modularity of our approach which allows us to analyze program parts separately (e.g., GT \GTni is ignored when inferring a PLRF). Thus, our "Boundedness" conditions differ slightly from the corresponding conditions in other definitions. Condition (b) requires that <sup>g</sup> ∈ GT<sup>&</sup>gt; never leads to a configuration where r is negative. Condition (a) states that in an admissible path where <sup>g</sup> <sup>=</sup> {t1, t2,...} ∈ GTni \ GT<sup>&</sup>gt; is used for continuing in configuration c , if executing t<sup>1</sup> in c makes r negative, then executing t<sup>2</sup> must

make r negative as well. Thus, such a g can never come before a general transition from GT<sup>&</sup>gt; in an admissible path and hence, <sup>g</sup> can be ignored when inferring upper bounds on the runtime. This increases the power of our approach and it allows us to consider only non-negative random variables in our correctness proofs.

We use SMT solvers to generate PLRFs automatically. Then for "Boundedness", we regard all <sup>s</sup> <sup>∈</sup> <sup>Σ</sup> with <sup>s</sup> (τg) = **t** and require "Boundedness" for any state s that is reachable from s .

Example 13 (PLRFs). *Consider again the PIP in Fig. 1 and the sets* GT<sup>&</sup>gt; = GTni <sup>=</sup> {g1} *and* GT <sup>&</sup>gt; = GT ni <sup>=</sup> {g3}*, which correspond to its two loops.*

*The function* <sup>r</sup> *with* <sup>r</sup>(1)=2 · <sup>x</sup> *and* <sup>r</sup>(0) = <sup>r</sup>(2)=0 *is a PLRF for* GT<sup>&</sup>gt; <sup>=</sup> GTni*: For every admissible configuration* (, t, s) *with* <sup>t</sup> <sup>∈</sup> <sup>g</sup><sup>1</sup> *we have* <sup>=</sup> <sup>1</sup> *and* <sup>s</sup>(r(1)) = 2 · <sup>s</sup>(x) <sup>≥</sup> <sup>0</sup>*, since* <sup>x</sup> *was positive before (due to* <sup>g</sup>1*'s guard) and it was either decreased by* 1 *or not changed by the update of* t<sup>1</sup> *resp.* t2*. Hence* r *is bounded. Moreover, for* <sup>s</sup>1(x) = <sup>s</sup>(<sup>x</sup> <sup>−</sup> 1) = <sup>s</sup>(x) <sup>−</sup> <sup>1</sup> *and* <sup>s</sup>2(x) = <sup>s</sup>(x) *we have:*

$$\exp\_{\mathbf{r},g,s} = \frac{1}{2} \cdot s\_1(\mathbf{r}(\ell\_1)) + \frac{1}{2} \cdot s\_2(\mathbf{r}(\ell\_1)) = 2 \cdot s(x) - 1 = s(\mathbf{r}(\ell\_1)) - 1$$

*So* <sup>r</sup> *is decreasing on* <sup>g</sup><sup>1</sup> *and as* GT<sup>&</sup>gt; <sup>=</sup> GTni*, also the non-increase property holds. Similarly,* r *with* r (2) = y *and* r (0) = r (1)=0 *is a PLRF for* GT <sup>&</sup>gt; = GT ni*.*

In our implementation, GT<sup>&</sup>gt; is always a singleton and we let GTni ⊆ GT be a cycle in the call graph where we find a PLRF for GT<sup>&</sup>gt; ⊆ GTni. The next subsection shows how we can then obtain an expected runtime bound for the overall program by searching for suitable ranking functions repeatedly.

#### **4.2 Inferring Expected Runtime Bounds**

Our approach to infer expected runtime bounds is based on an underlying (nonprobabilistic) bound pair (RB, SB) which is computed by existing techniques (in our implementation, we use [18]). To do so, we abstract the PIP to a standard integer transition system by ignoring the probabilities of transitions and replacing probabilistic with non-deterministic sampling (e.g., the update η(x) = GEO( <sup>1</sup> 2 ) would be replaced by <sup>η</sup>(x) = <sup>x</sup> <sup>+</sup> <sup>u</sup> with <sup>u</sup> ∈ V \PV, where u > 0 is added to the guard). Of course, we usually have RB(t) = <sup>∞</sup> for some transitions <sup>t</sup>.

We start with the expected bound pair (RB<sup>E</sup>, SB<sup>E</sup>) that is obtained by lifting (RB, SB) as in Thm. 10. Afterwards, the expected runtime bound RB<sup>E</sup> is improved repeatedly by applying the following Thm. 16 (and similarly, SB<sup>E</sup> is improved repeatedly by applying Thm. 23 and 25 from Sect. 5). Our approach alternates the improvement of RB<sup>E</sup> and SB<sup>E</sup>, and it uses expected size bounds on "previous" transitions to improve expected runtime bounds, and vice versa.

To improve RB<sup>E</sup>, we generate a PLRF r for a part of the program. To obtain a bound for the full program from r, one has to determine which transitions can enter the program part and from which locations it can be entered.

**Definition 14 (Entry Locations and Transitions).** For GTni ⊆ GT and <sup>∈</sup> <sup>L</sup>, the entry transitions are ETGTni () = {<sup>g</sup> ∈ GT \ GTni | ∃<sup>t</sup> <sup>∈</sup> g. t = ( , , , , )}. Then the entry locations are all start locations of GTni whose entry transitions

are not empty, i.e., ELGTni <sup>=</sup> { | ETGTni () <sup>=</sup> <sup>∅</sup> <sup>∧</sup> (, , , , ) <sup>∈</sup> GTni}. 1

Example 15 (Entry Locations and Transitions). *For the PIP from Fig. 1 and* GTni <sup>=</sup> {g1}*, we have* ELGTni <sup>=</sup> {1} *and* ETGTni (1) = {g0}*. So the loop formed by* g<sup>1</sup> *is entered at location* <sup>1</sup> *and the general transition* g<sup>0</sup> *has to be executed before. Similarly, for* GT ni <sup>=</sup> {g3} *we have* ELGT - ni <sup>=</sup> {2} *and* ETGT - ni (2) = {g2}*.*

Recall that if r is a PLRF for GT<sup>&</sup>gt; ⊆ GTni, then in a program that is restricted to GTni, <sup>r</sup>() is an upper bound on the expected number of executions of transitions from GT<sup>&</sup>gt; when starting in . Since <sup>r</sup>() may contain negative coefficients, it is not weakly monotonically increasing in general. To turn expressions <sup>e</sup> <sup>∈</sup> <sup>R</sup>[PV] into bounds from B, let the over-approximation +·, replace all coefficients by their absolute value. So for example, <sup>+</sup><sup>x</sup> <sup>−</sup> <sup>y</sup>, <sup>=</sup> <sup>+</sup><sup>x</sup> + (−1) · <sup>y</sup>, <sup>=</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup>. Clearly, we have <sup>|</sup>s|(+e,) ≥ |s|(e) for all <sup>s</sup> <sup>∈</sup> <sup>Σ</sup>. Moreover, if <sup>e</sup> <sup>∈</sup> <sup>R</sup>[PV] then <sup>+</sup>e,∈B.

To turn <sup>+</sup>r(), into a bound for the full program, one has to take into account how often the sub-program with the transitions GTni is reached via an entry transition <sup>h</sup> ∈ ETGTni () for some ∈ ELGTni . This can be over-approximated by <sup>t</sup>=( , , , ,)∈<sup>h</sup> RB(t), which is an upper bound on the number of times that transitions in <sup>h</sup> to the entry location of GTni are applied in a full program run.

The bound <sup>+</sup>r(), is expressed in terms of the program variables at the entry location of GTni. To obtain a bound in terms of the variables at the start of the program, one has to take into account which value a program variable x may have when the sub-program GTni is reached. For every entry transition <sup>h</sup> ∈ ETGTni (), this value can be over-approximated by SB<sup>E</sup>(h, , x). Thus, we have to instantiate each variable <sup>x</sup> in <sup>+</sup>r(), by SB<sup>E</sup>(h, , x). Let SB<sup>E</sup>(h, , ·) : PV → B be the mapping with SB<sup>E</sup>(h, , ·)(x) = SB<sup>E</sup>(h, , x). Hence, SB<sup>E</sup>(h, , ·)(+r(),) overapproximates the expected number of applications of GT<sup>&</sup>gt; if GTni is entered in location , where this bound is expressed in terms of the input variables of the program. Here, weak monotonic increase of <sup>+</sup>r(), ensures that instantiating its variables by an over-approximation of their size yields an over-approximation of the runtime.

**Theorem 16 (Expected Runtime Bounds).** Let (RB<sup>E</sup>, SB<sup>E</sup>) be an expected bound pair, RB a (non-probabilistic) runtime bound, and r a PLRF for GT<sup>&</sup>gt; ⊆ GTni ⊆ GT . Then RB <sup>E</sup> : GT → B is an expected runtime bound where

$$\mathcal{BRB}\_{\mathbb{E}}'(g) = \begin{cases} \sum\_{\substack{\ell \in \mathcal{EL}\_{\mathcal{ST}\_{\mathrm{ni}}} \\ h \in \mathcal{ET\_{\mathcal{ST}\_{\mathrm{ni}}}(\ell)}}} \left( \sum\_{\substack{\ell \in \{\star,\star,\star,\ell\} \in h}} \mathcal{RB}(t) \right) \cdot \left( \mathcal{SB}\_{\mathbb{E}}(h,\ell,\cdot) \left( \lceil \mathbf{r}(\ell) \rceil \right) \right), & \text{if } g \in \mathcal{GT}\_{>} \\ h \in \mathcal{TS\_{\mathbb{E}}(g)}, & \text{if } g \notin \mathcal{GT}\_{>} \end{cases}$$

Example 17 (Expected Runtime Bounds). *For the PIP from Fig. 1, our approach starts with* (RB<sup>E</sup>, SB<sup>E</sup>) *from Ex. 11 which results from lifting the bound pair from Ex. 5. To improve the bound* RB<sup>E</sup>(g1) = <sup>∞</sup>*, we use the PLRF* <sup>r</sup> *for* GT<sup>&</sup>gt; <sup>=</sup> GTni <sup>=</sup> {g1} *from Ex. 13. By Ex. 15, we have* ELGTni <sup>=</sup> {1} *and* ETGTni (1) = {g0} *with* <sup>g</sup><sup>0</sup> <sup>=</sup> {t0}*, whose runtime bound is* RB(t0)=1*, see Ex. 5. Using the expected size bound* SB<sup>E</sup>(g0, 1, x) = <sup>x</sup> *from Ex. 9, Thm. 16 yields*

$$\mathcal{R}\mathcal{B}\_{\mathbb{E}}'(g\_1) = \mathcal{R}\mathcal{B}(t\_0) \cdot \mathcal{S}\mathcal{B}\_{\mathbb{E}}(g\_0, \ell\_1, \cdot) \left( [\mathbf{r}(\ell\_1)] \right) = 1 \cdot 2 \cdot x = 2 \cdot x.$$

<sup>1</sup> For a set of sets like GTni, GTni denotes their union, i.e., GTni <sup>=</sup> <sup>g</sup>∈GTni <sup>g</sup>.

*To improve* RBE(g3)*, we use the PLRF* <sup>r</sup> *for* GT <sup>&</sup>gt; = GT ni <sup>=</sup> {g3} *from Ex. 13. As* ELGT - ni <sup>=</sup> {2} *and* ETGT - ni (2) = {g2} *by Ex. 15, where* <sup>g</sup><sup>2</sup> <sup>=</sup> {t3} *and* RB(t3)=1 *(Ex. 5), with the bound* SBE(g2, 2, y)=6 · <sup>x</sup><sup>2</sup> + 2 · <sup>y</sup> *from Ex. 9, Thm. 16 yields*

$$\mathcal{R}\mathcal{B}\_{\mathbb{E}}'(g\_3) = \mathcal{R}\mathcal{B}(t\_3) \cdot \mathcal{S}\mathcal{B}\_{\mathbb{E}}(g\_2, \ell\_2, \cdot) \left( \lceil \mathbf{r}'(\ell\_2) \rceil \right) = 1 \cdot \mathcal{S}\mathcal{B}\_{\mathbb{E}}(g\_2, \ell\_2, y) = 6 \cdot x^2 + 2 \cdot y.$$

*So based on the expected size bounds of Ex. 9, we have shown how to compute the expected runtime bounds of Ex. 9 automatically.*

Similar to [18], our approach relies on combining bounds that one has computed earlier in order to derive new bounds. Here, bounds may be combined linearly, bounds may be multiplied, and bounds may even be substituted into other bounds. But in contrast to [18], sometimes one may combine expected bounds that were computed earlier and sometimes it is only sound to combine non-probabilistic bounds: If a new bound is computed by linear combinations of earlier bounds, then it is sound to use the "expected versions" of these earlier bounds. However, if two bounds are multiplied, then it is in general not sound to use their "expected versions". Thus, it would be unsound to use the expected runtime bounds RB<sup>E</sup>(h) instead of the non-probabilistic bounds <sup>t</sup>=( , , , ,)∈<sup>h</sup> RB(t) on the entry transitions in Thm. 16 (a counterexample is given in [47]).<sup>2</sup>

In general, if bounds b1,...,b<sup>n</sup> are substituted into another bound b, then it is sound to use "expected versions" of the bounds b1,...,b<sup>n</sup> if b is concave, see, e.g., [10, 11, 40]. Since bounds from B do not contain negative coefficients, we obtain that a finite<sup>3</sup> bound <sup>b</sup> ∈ B is concave iff it is a linear polynomial (see [47]).

Thus, in Thm. 16 we may substitute expected size bounds SB<sup>E</sup>(h, , x) into <sup>+</sup>r(),, since we restricted ourselves to linear ranking functions <sup>r</sup> and hence, <sup>+</sup>r(), is also linear. Note that in contrast to [11], where a notion of concavity was used to analyze probabilistic term rewriting, a multilinear expression like <sup>x</sup> · <sup>y</sup> is not concave when regarding both arguments simultaneously. Hence, it is unsound to use such ranking functions in Thm. 16. See [47] for a counterexample to show why substituting expected bounds into a non-linear bound is incorrect in general.

### **5 Computing Expected Size Bounds**

We first compute local bounds for one application of a transition (Sect. 5.1). To turn them into global bounds, we encode the data flow of a PIP in a graph. Sect. 5.2 then presents our technique to compute expected size bounds.

#### **5.1 Local Change Bounds and General Result Variable Graph**

We first compute a bound on the expected change of a variable during an update. More precisely, for every general result variable (g, , x) we define a bound CB<sup>E</sup>(g, , x) on the change of the variable <sup>x</sup> that we can expect in one

<sup>2</sup> An exception is the special case where r() is *constant*. Then, our implementation indeed uses the expected bound RB<sup>E</sup>(h) instead of

<sup>t</sup>=( , , , ,)∈<sup>h</sup> RB(t) [47]. <sup>3</sup> A bound is *finite* if it does not contain <sup>∞</sup>. We always simplify expressions and thus, a bound like 0 · ∞ is also finite, because it simplifies to 0, as usual in measure theory.

execution of the general transition g when reaching location . So we consider all <sup>t</sup> = ( , p, , η, ) <sup>∈</sup> <sup>g</sup> and the expected difference between the current value of <sup>x</sup> and its update <sup>η</sup>(x). However, for <sup>η</sup>(x) <sup>∈</sup> <sup>Z</sup>[V], <sup>η</sup>(x) <sup>−</sup> <sup>x</sup> is not necessarily from B because it may contain negative coefficients. Thus, we use the overapproximation <sup>+</sup>η(x) <sup>−</sup> <sup>x</sup>, (where we always simplify expressions before applying +·,, e.g., <sup>+</sup><sup>x</sup> <sup>−</sup> <sup>x</sup>, <sup>=</sup> <sup>+</sup>0, = 0). Moreover, <sup>+</sup>η(x) <sup>−</sup> <sup>x</sup>, may contain temporary variables. Let tv<sup>t</sup> : V→B instantiate all temporary variables by the largest possible value they can have after evaluating the transition t. Hence, we then use tvt(+η(x) <sup>−</sup> <sup>x</sup>,) instead. For tvt, we have to use the underlying non-probabilistic size bound SB for the program (since the scheduler determines the values of temporary variables by non-deterministic (non-probabilistic) choice). If x is updated according to a bounded distribution function <sup>d</sup> ∈ D, then as in Sect. 2, let <sup>E</sup>(d) ∈ B denote a finite bound on <sup>d</sup>, i.e., <sup>E</sup>abs(d(s)) ≤ |s|(E(d)) for all <sup>s</sup> <sup>∈</sup> <sup>Σ</sup>.

**Definition 18 (Expected Local Change Bound).** Let SB be a size bound. Then CB<sup>E</sup> : GRV → B with CB<sup>E</sup>(g, , x) = <sup>p</sup> · cht(η(x), x), where

$$\operatorname{ch}\_{t}(\eta(x),x) = \begin{cases} \mathfrak{E}(d), & \operatorname{if } \eta(x) = d \in \mathcal{D} \\ \operatorname{tv}\_{t}([\eta(x) - x]), & \operatorname{otherwise} \end{cases} \text{ and } \operatorname{tv}\_{t}(y) = \begin{cases} \mathcal{SB}(t,y), \operatorname{if } y \notin \mathcal{PV} \\ y, & \text{if } y \in \mathcal{PV} \end{cases}$$

Example 19 (CB<sup>E</sup>). *For the PIP of Fig. 1, we have* CB<sup>E</sup>(g0, , ) = CB<sup>E</sup>(g2, , ) = CB<sup>E</sup>(g3, 2, x)=0*, since the respective updates are identities. Moreover,*

$$\mathcal{CB}\_{\mathbb{E}}(g\_1, \ell\_1, x) = \frac{1}{2} \cdot \lceil (x - 1) - x \rceil + \frac{1}{2} \cdot \lceil x - x \rceil = \frac{1}{2} \cdot 1 + \frac{1}{2} \cdot 0 = \frac{1}{2}.$$

*In a similar way, we obtain* CB<sup>E</sup>(g1, 1, y) = <sup>x</sup> *and* CB<sup>E</sup>(g3, 2, y)=1*.*

The following theorem shows that for any admissible configuration in a state s , CB<sup>E</sup>(g, , x) is an upper bound on the expected value of <sup>|</sup>s(x) <sup>−</sup> <sup>s</sup> (x)<sup>|</sup> if <sup>s</sup> is the next state obtained when applying g in state s to reach location .

**Theorem 20 (Soundness of** CB<sup>E</sup>**).** For any (g, , x) ∈ GRV, scheduler <sup>S</sup>, <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>Σ</sup>, and admissible configuration <sup>c</sup> = ( , , s ), we have

$$|s'| \left( \mathcal{C} \mathcal{B}\_{\mathbb{E}}(g, \ell, x) \right) \\
\geq \sum\_{c = (\ell, t, s) \in \mathsf{Conf}, \ t \in g} pr\_{\mathfrak{S}, s\_0} (c' \to c) \cdot |s(x) - s'(x)|.$$

To obtain global bounds from the local bounds CB<sup>E</sup>(g, , x), we construct a general result variable graph which encodes the data flow between variables. Let pre(g) = ET<sup>∅</sup>(g) be the the set of pre-transitions of <sup>g</sup> which lead into <sup>g</sup>'s start location g. Moreover, for <sup>α</sup> = (g, , x) ∈ GRV let its active variables actV(α) consist of all variables occurring in the bound <sup>x</sup> <sup>+</sup> CB<sup>E</sup>(α) for <sup>α</sup>'s expected size.

**Definition 21 (General Result Variable Graph).** The general result variable graph has the set of nodes GRV and the set of edges GRVE, where

$$\mathcal{GRV}\mathcal{E} = \{ ((g',\ell',x'), (g,\ell,x)) \mid g' \in \text{pre}(g) \land \ell' = \ell\_g \land x' \in \text{act}\,\text{V}(g,\ell,x) \}.$$

Example 22 (General Result Variable Graph). *The general result variable graph for the PIP of Fig. 1 is shown below. For* CB<sup>E</sup> *from Ex. 19, we have* actV(g1, 1, x) = {x}*, as* <sup>x</sup> <sup>+</sup> CB<sup>E</sup>(α) = <sup>x</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> *contains no variable except* <sup>x</sup>*.*

(g0, 1, y) *Similarly,* actV(g1, 1, y) = {x, y}*, as* <sup>x</sup> *and* <sup>y</sup> *are contained in* <sup>y</sup>+CBE(g1, 1, y) = <sup>y</sup>+x*. For all other* <sup>α</sup> ∈ GRV*, we have* actV( , , x) = {x} *and* actV( , , y) = {y}*. As* pre(g1) = {g0, g1}*, the graph captures the dependence of* (g1, 1, x) *on* (g0, 1, x) *and* (g1, 1, x)*, and of* (g1, 1, y) *on* (g0, 1, x)*,* (g0, 1, y)*,* (g1, 1, x)*, and* (g1, 1, y)*. The other edges are obtained in a similar way.*

#### **5.2 Inferring Expected Size Bounds**

We now compute global expected size bounds for the general result variables by considering the SCCs of the general result variable graph separately. As usual, an SCC is a maximal subgraph with a path from each node to every other node. An SCC is trivial if it consists of a single node without an edge to itself. We first handle trivial SCCs in Sect. 5.2.1 and consider non-trivial SCCs in Sect. 5.2.2.

**5.2.1 Inferring Expected Size Bounds for Trivial SCCs** By Thm. 20, <sup>x</sup> <sup>+</sup> CB<sup>E</sup>(g, , x) is a local bound on the expected value of <sup>x</sup> after applying <sup>g</sup> once in order to enter . However, this bound is formulated in terms of the values of the variables immediately before applying g. We now want to compute global bounds in terms of the initial values of the variables at the start of the program.

If <sup>g</sup> is initial (i.e., <sup>g</sup> ∈ GT<sup>0</sup> since <sup>g</sup> starts in the initial location 0), then <sup>x</sup> <sup>+</sup> CB<sup>E</sup>(g, , x) is already a global bound, as the values of the variables before the application of g are the initial values of the variables at the program start.

Otherwise, the variables <sup>y</sup> occurring in the local bound <sup>x</sup> <sup>+</sup> CB<sup>E</sup>(g, , x) have to be replaced by the values that they can take in a full program run before applying the transition <sup>g</sup>. Thus, we have to consider all transitions <sup>h</sup> <sup>∈</sup> pre(g) and instantiate every variable y by the maximum of the values that y can have after applying h. Here, we again over-approximate the maximum by the sum.

If CB<sup>E</sup>(g, , x) is concave (i.e., a linear polynomial), then we can instantiate its variables by expected size bounds SB<sup>E</sup>(h, g, y). However, this is unsound if CB<sup>E</sup>(g, , x) is not linear, i.e., not concave (see [47] for a counterexample). So in this case, we have to use non-probabilistic bounds SB(t, y) instead.

As in Sect. 4.2, we use an underlying non-probabilistic bound pair (RB, SB) and start with the expected pair (RB<sup>E</sup>, SB<sup>E</sup>) obtained by lifting (RB, SB) according to Thm. 10. While Thm. 16 improves RB<sup>E</sup>, we now improve SB<sup>E</sup>. Here, the SCCs of the general result variable graph should be treated in topological order, since then one may first improve SB<sup>E</sup> for result variables corresponding to pre(g), and use that when improving SB<sup>E</sup> for result variables of the form (g, , ).

**Theorem 23 (Expected Size Bounds for Trivial SCCs).** Let SB<sup>E</sup> be an expected size bound, SB a (non-probabilistic) size bound, and let <sup>α</sup> = (g, , x) form a trivial SCC of the general result variable graph. Let size<sup>α</sup> <sup>E</sup> and size<sup>α</sup> be mappings from PV → B with size<sup>α</sup> <sup>E</sup>(y) = <sup>h</sup>∈pre(g) SB<sup>E</sup>(h, g, y) and size<sup>α</sup>(y) = <sup>h</sup>∈pre(g), t=( , , , ,g)∈<sup>h</sup> SB(t, y). Then SB <sup>E</sup> : GRV → B is an expected size bound, where SB <sup>E</sup>(β) = SBE(β) for <sup>β</sup> <sup>=</sup> <sup>α</sup> and

$$\mathcal{C}\mathcal{B}\_{\mathbb{E}}^{\prime}(\alpha) = \begin{cases} x + \mathcal{C}\mathcal{B}\_{\mathbb{E}}(\alpha), & \text{if } g \in \mathcal{G}\overline{\mathcal{T}}\_{0} \\ \text{size}\_{\mathbb{E}}^{\alpha}(x + \mathcal{C}\mathcal{B}\_{\mathbb{E}}(\alpha)), & \text{if } g \notin \mathcal{G}\overline{\mathcal{T}}\_{0}, \mathcal{C}\mathcal{B}\_{\mathbb{E}}(\alpha) \text{ is linear} \\ \text{size}\_{\mathbb{E}}^{\alpha}(x) + \text{size}^{\alpha}(\mathcal{C}\mathcal{B}\_{\mathbb{E}}(\alpha)), & \text{if } g \notin \mathcal{G}\overline{\mathcal{T}}\_{0}, \mathcal{C}\mathcal{B}\_{\mathbb{E}}(\alpha) \text{ is not linear} \end{cases}$$

Example 24 (SB<sup>E</sup> for Trivial SCCs). *The general result variable graph in Ex. 22 contains 4 trivial SCCs formed by* α<sup>x</sup> = (g0, 1, x)*,* α<sup>y</sup> = (g0, 1, y)*,* β<sup>x</sup> = (g2, 2, x)*, and* β<sup>y</sup> = (g2, 2, y)*. For all these general result variables, the expected local change bound* CB<sup>E</sup> *is* <sup>0</sup> *(see Ex. 19). Thus, it is linear. Since* <sup>g</sup><sup>0</sup> ∈ GT0*, Thm. 23 yields* SB <sup>E</sup>(αx) = <sup>x</sup> <sup>+</sup> CB<sup>E</sup>(αx) = <sup>x</sup> *and* SB <sup>E</sup>(αy) = <sup>y</sup> <sup>+</sup> CB<sup>E</sup>(αy) = <sup>y</sup>*.*

*By treating SCCs in topological order, when handling* βx*,* βy*, we can assume that we already have* SB<sup>E</sup>(αx) = <sup>x</sup>*,* SB<sup>E</sup>(αy) = <sup>y</sup> *and* SB<sup>E</sup>(g1, 1, x)=2 · <sup>x</sup>*,* SB<sup>E</sup>(g1, 1, y)=6 · <sup>x</sup><sup>2</sup> <sup>+</sup> <sup>y</sup> *(see Ex. 9) for the result variables corresponding to* pre(g2) = {g0, g1}*. We will explain in Sect. 5.2.2 how to compute such expected size bounds for non-trivial SCCs. Hence, by Thm. 23 we obtain* SB <sup>E</sup>(βx) = size<sup>β</sup><sup>x</sup> <sup>E</sup> (x+CB<sup>E</sup>(βx)) = SB<sup>E</sup>(αx) +SB<sup>E</sup>(g1, 1, x)=3 ·<sup>x</sup> *and* SB <sup>E</sup>(βy) = size<sup>β</sup><sup>y</sup> <sup>E</sup> (y + CB<sup>E</sup>(βy)) = SB<sup>E</sup>(αy) + SB<sup>E</sup>(g1, 1, y)=6 · <sup>x</sup><sup>2</sup> + 2 · <sup>y</sup>*.*

**5.2.2 Inferring Expected Size Bounds for Non-Trivial SCCs** Now we handle non-trivial SCCs C of the general result variable graph. An upper bound for the expected size of a variable <sup>x</sup> when entering <sup>C</sup> is obtained from SB<sup>E</sup>(β) for all general result variables β = ( , , x) which have an edge to C.

To turn CB<sup>E</sup>(g, , x) into a global bound, as in Thm. 23 its variables <sup>y</sup> have to be instantiated by the values size(g,,x) (y) that they can take in a full program run before applying a transition from g. Thus, size(g,,x) (CB<sup>E</sup>(g, , x)) is a global bound on the expected change resulting from one application of g. To obtain an upper bound for the whole SCC C, we add up these global bounds for all (g, , x) <sup>∈</sup> <sup>C</sup> and take into account how often the general transitions in the SCC are expected to be executed, i.e., we multiply with their expected runtime bound RB<sup>E</sup>(g). So while in Thm. 16 we improve RB<sup>E</sup> using expected size bounds for previous transitions, we now improve SB<sup>E</sup>(C) using expected runtime bounds for the transitions in C and expected size bounds for previous transitions.

**Theorem 25 (Expected Size Bounds for Non-Trivial SCCs).** Let (RB<sup>E</sup>, SB<sup>E</sup>) be an expected bound pair, (RB, SB) a (non-probabilistic) bound pair, and let <sup>C</sup> ⊆ GRV form a non-trivial SCC of the general result variable graph where GT<sup>C</sup> <sup>=</sup> {<sup>g</sup> ∈ GT | (g, , ) <sup>∈</sup> <sup>C</sup>}. Then SB <sup>E</sup> is an expected size bound:

$$\mathcal{SB}\_{\mathbb{E}}^{\prime}(\alpha) = \begin{cases} \sum\_{(\beta,\alpha) \in \mathcal{GEO}\mathcal{E}, \,\beta \notin C, \,\alpha \in C, \,\beta = (\text{\\_}, \text{\\_})} \mathcal{SB}\_{\mathbb{E}}(\beta) & + \\ \sum\_{g \in \mathcal{GE}\_{C}} \mathcal{BS}\_{\mathbb{E}}(g) \cdot \left( \sum\_{\alpha^{\prime} = (g, \square, x) \in C} \text{size}^{\alpha^{\prime}} \left( \mathcal{CB}\_{\mathbb{E}} \left( \alpha^{\prime} \right) \right) \right), & \text{if } \alpha = (\text{\\_}, \text{\\_}, x) \in C \\ \mathcal{BS}\_{\mathbb{E}}(\alpha), & \text{\\_} \text{\\_} \end{cases}$$

Here we really have to use the non-probabilistic size bound size<sup>α</sup>- instead of size<sup>α</sup>- <sup>E</sup> , even if CB<sup>E</sup>(α ) is linear, i.e., concave. Otherwise we would multiply the expected values of two random variables which are not independent.

Example 26 (SB<sup>E</sup> for Non-Trivial SCCs). *The general result variable graph in*

*Ex. 22 contains 4 non-trivial SCCs formed by* α <sup>x</sup> = (g1, 1, x)*,* α <sup>y</sup> = (g1, 1, y)*,* β <sup>x</sup> = (g3, 2, x)*, and* β <sup>y</sup> = (g3, 2, y)*. By the results on* SB<sup>E</sup>*,* RB<sup>E</sup>*,* CB<sup>E</sup>*, and* SB *from Ex. 24, 17, 19, and 5, Thm. 25 yields the expected size bound in Ex. 9:*

$$\begin{split} \mathcal{SB}\_{\mathbb{E}}^{\mathbb{B}}(\alpha'\_{x}) &= \mathcal{BS}\_{\mathbb{E}}(\alpha\_{x}) + \mathcal{RB}\_{\mathbb{E}}(g\_{1}) \cdot \text{size}^{\alpha'\_{x}}(\mathcal{CB}\_{\mathbb{E}}(\alpha'\_{x})) = x + 2 \cdot x \cdot \frac{1}{2} = 2 \cdot x \\ \mathcal{SB}\_{\mathbb{E}}^{\mathbb{B}}(\alpha'\_{y}) &= \mathcal{BS}\_{\mathbb{E}}(\alpha\_{y}) + \mathcal{RB}\_{\mathbb{E}}(g\_{1}) \cdot \text{size}^{\alpha'\_{y}}(\mathcal{CB}\_{\mathbb{E}}(\alpha'\_{y})) = y + 2 \cdot x \cdot \text{size}^{\alpha'\_{y}}(x) \\ &= y + 2 \cdot x \cdot \sum\_{i \in \{0, 1, 2\}} \mathcal{BS}(t\_{i}, x) &= 6 \cdot x^{2} + y \\ \mathcal{BS}\_{\mathbb{E}}^{\mathbb{B}}(\beta'\_{x}) &= \mathcal{BS}\_{\mathbb{E}}(\beta\_{x}) + \mathcal{RB}\_{\mathbb{E}}(g\_{3}) \cdot \text{size}^{\beta'\_{x}}(\mathcal{CB}\_{\mathbb{E}}(\beta'\_{x})) = 3 \cdot x + (6x^{2} + 2y) \cdot 0 = 3 \cdot x \\ \mathcal{BS}\_{\mathbb{E}}^{\mathbb{B}}(\beta'\_{y}) &= \mathcal{BS}\_{\mathbb{E}}(\beta\_{y}) + \mathcal{RB}\_{\mathbb{E}}(g\_{3}) \cdot \text{size}^{\beta'\_{y}}(\mathcal{CB}\_{\mathbb{E}}(\beta'\_{y})) = 6 \cdot x^{2} + 2 \cdot y + (6x^{2} + 2y) \cdot 1 \\ &= 12 \cdot x^{2} + 4 \cdot y \end{split}$$

### **6 Related Work, Implementation, and Conclusion**

Related Work Our approach adapts techniques from [18] to probabilistic programs. As explained in Sect. 1, this adaption is not at all trivial (see our proofs in [47]).

There has been a lot of work on proving PAST and inferring bounds on expected runtimes using supermartingales, e.g., [1, 11, 15, 16, 22–25, 29, 32, 48, 62]. While these techniques infer one (lexicographic) ranking supermartingale to analyze the complete program, our approach deals with information flow between different program parts and analyzes them separately.

There is also work on modular analysis of almost sure termination (AST) [1, 25, 26, 37, 38, 48], i.e., termination with probability 1. This differs from our results, since AST is compositional, in contrast to PAST (see, e.g., [41, 42]).

A fundamentally different approach to ranking supermartingales (i.e., to forward-reasoning) is backward-reasoning by so-called expectation transformers, see, e.g., [10, 41, 42, 44–46, 50, 52, 61]. In this orthogonal reasoning, [10, 41, 42, 52] consider the connection of the expected runtime and size. While expectation transformers apply backward- instead of forward-reasoning, their correctness can also be justified using supermartingales. More precisely, Park induction for upper bounds on the expected runtime via expectation transformers essentially ensures that a certain stochastic process is a supermartingale (see [33] for details).

To the best of our knowledge, the only available tools for the inference of upper bounds on the expected runtimes of probabilistic programs are [10, 50, 61, 62]. The tool of [61] deals with data types and higher order functions in probabilistic ML programs and does not support programs whose complexity depends on (possibly negative) integers (see [55]). Furthermore, the tool of [48] focuses on proving or refuting (P)AST of probabilistic programs for so-called Prob-solvable loops, which do not allow for nested or sequential loops or non-determinism. So both [61] and [48] are orthogonal to our work. We discuss [10, 50, 62] below.

Implementation We implemented our analysis in a new version of our tool KoAT [18]. KoAT is an open-source tool written in OCaml, which can also be downloaded as a Docker image and accessed via a web interface [43].

Given a PIP, the analysis proceeds as in Alg. 1. The preprocessing in Line 1 adds invariants to guards (using APRON [39] to generate (non-probabilistic) invariants), unfolds transitions [19], and removes unreachable locations, transitions with probability 0, and transitions with unsatisfiable guards (using Z3 [49]).

**Input:** PIP (PV, L, GT , 0) **1** preprocess the PIP **<sup>2</sup>** (RB, SB) ← perform non-probabilistic analysis using [18] **<sup>3</sup>** (RB<sup>E</sup>, SB<sup>E</sup>) ← lift (RB, SB) to an expected bound pair with Thm. 10 **4 repeat 5 for all** SCCs C of the general result variable graph in topological order **do <sup>6</sup> if** C = {α} is trivial **then** SB <sup>E</sup> ← improve SB<sup>E</sup> for C by Thm. 23 **<sup>7</sup> else** SB <sup>E</sup> ← improve SB<sup>E</sup> for C by Thm. 25 **<sup>8</sup> for all** α ∈ C **do** SB<sup>E</sup>(α) ← min{SB<sup>E</sup>(α), SB <sup>E</sup>(α)} **<sup>9</sup> for all** general transitions g ∈ GT **do <sup>10</sup>** RB <sup>E</sup> ← improve RB<sup>E</sup> for GT<sup>&</sup>gt; = {g} by Thm. 16 **<sup>11</sup>** RB<sup>E</sup>(g) ← min{RB<sup>E</sup>(g), RB <sup>E</sup>(g)} **12 until** no bound is improved anymore **Output:** <sup>g</sup>∈GT RB<sup>E</sup>(g)

**Algorithm 1:** Overall approach to infer bounds on expected runtimes

We start by a non-probabilistic analysis and lift the resulting bounds to an initial expected bound pair (Lines 2 and 3). Afterwards, we first try to improve the expected size bounds using Thm. 23 and 25, and then we attempt to improve the expected runtime bounds using Thm. 16 (if we find a PLRF using Z3). To determine the "minimum" of the previous and the new bound, we use a heuristic which compares polynomial bounds by their degree. While we over-approximated the maximum of expressions by their sum to ease readability in this paper, KoAT also uses bounds containing "min" and "max" to increase precision.

This alternating modular computation of expected size and runtime bounds is repeated so that one can benefit from improved expected runtime bounds when computing expected size bounds and vice versa. We abort this improvement of expected bounds in Alg. 1 if they are all finite (or when reaching a timeout).

To assess the power of our approach, we performed an experimental evaluation of our implementation in KoAT. We did not compare with the tool of [62], since [62] expects the program to be annotated with already computed invariants. But for many of the examples in our experiments, the invariant generation tool [56] used by [62] did not find invariants strong enough to enable a meaningful analysis (and we could not apply APRON [39] due to the different semantics of invariants).

Instead, we compare KoAT with the tools Absynth [50] and eco-imp [10] which are both based on a conceptionally different backward-reasoning approach. We ran the tools on all 39 examples from Absynth's evaluation in [50] (except recursive, which contains non-tail-recursion and thus cannot be encoded as a PIP), and on the 8 additional examples from the artifact of [50]. Moreover, our collection has 29 additional benchmarks: 14 examples that illustrate different aspects of PIPs, 5 PIPs based on examples from [50] where we removed assumptions, and 10 PIPs based on benchmarks from the TPDB [59] where some transitions were enriched with probabilistic behavior. The TPDB is a collection of typical programs used in the annual Termination and Complexity Competition [31]. We ran the experiments on an iMac with an Intel i5-2500S CPU and 12 GB of RAM under macOS Sierra for Absynth and NixOS 20.03 for KoAT and eco-imp. A timeout of 5 minutes per




Fig. 2: Results on benchmarks from [50]

Fig. 3: Results on our new benchmarks

example was applied for all tools. The average runtime of successful runs was 4.26 s for KoAT, 3.53 s for Absynth, and just 0.93 s for eco-imp.

Fig. 2 and 3 show the generated asymptotic bounds, where n is the maximal absolute value of the program variables at the program start. Here, "∞" indicates that no finite time bound could be computed and "TO" means "timeout". The detailed asymptotic results of all tools on all examples can be found in [43, 47].

Absynth and eco-imp slightly outperform KoAT on the examples from Absynth's collection, while KoAT is considerably stronger than both tools on the additional benchmarks. In particular, Absynth and eco-imp outperform our approach on examples with nested probabilistic loops. While our modular approach can analyze inner loops separately when searching for probabilistic ranking functions, Thm. 16 then requires non-probabilistic time bounds for all transitions entering the inner loop. But these bounds may be infinite if the outer loop has probabilistic behavior itself. Moreover, in contrast to our work and [10], the approach of [50] does not require weakly monotonic bounds.

On the other hand, KoAT is superior to Absynth and eco-imp on large examples with many loops, where only a few transitions have probabilistic behavior (this might correspond to the typical application of randomization in practical programming). Here, we benefit from the modularity of our approach which treats loops independently and combines their bounds afterwards. Absynth and eco-imp also fail for our leading example of Fig. 1, while KoAT infers a quadratic bound. Hence, the tools have particular strengths on orthogonal kinds of examples.

KoAT's source code is available at https://github.com/aprove-developers/ KoAT2-Releases/tree/probabilistic. To obtain a KoAT artifact, see https:// aprove-developers.github.io/ExpectedUpperBounds/ for a static binary and Docker image. This web site also provides all examples from our evaluation, detailed outputs of our experiments, and a web interface to run KoAT directly online.

Conclusion We presented a new modular approach to infer upper bounds on the expected runtimes of probabilistic integer programs. To this end, non-probabilistic and expected runtime and size bounds on parts of the program are computed in an alternating fashion and then combined to an overall expected runtime bound. In the evaluation, our tool KoAT succeeded on 91% of all examples, while the main other related tools (Absynth and eco-imp) only inferred finite bounds for 68% resp. 77% of the examples. In future work, it would be interesting to consider a modular combination of these tools (resp. of their underlying approaches).

Acknowledgements We thank Carsten Fuhs for discussions on initial ideas.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Probabilistic and Systematic Coverage of Consecutive Test-Method Pairs for Detecting Order-Dependent Flaky Tests**

Anjiang Wei1, Pu Yi1, Tao Xie<sup>1</sup> (), Darko Marinov2, and Wing Lam<sup>2</sup>

<sup>1</sup> Peking University, Beijing, China

{weianjiang,lukeyi,taoxie}@pku.edu.cn <sup>2</sup> University of Illinois at Urbana-Champaign, Urbana, IL, USA {marinov,winglam2}@illinois.edu

**Abstract.** Software developers frequently check their code changes by running a *set* of tests against their code. Tests that can nondeterministically pass or fail when run on the same code version are called *flaky tests*. These tests are a major problem because they can mislead developers to debug their recent code changes when the failures are unrelated to these changes. One prominent category of flaky tests is order-dependent (OD) tests, which can deterministically pass or fail depending on the *order* in which the set of tests are run. By detecting OD tests in advance, developers can fix these tests before they change their code. Due to the high cost required to explore all possible orders (n! permutations for n tests), prior work has developed tools that randomize orders to detect OD tests. Experiments have shown that randomization can detect many OD tests, and that most OD tests depend on just one other test to fail. However, there was no analysis of the probability that randomized orders detect OD tests. In this paper, we present the first such analysis and also present a simple change for sampling random test orders to increase the probability. We finally present a novel algorithm to systematically explore all consecutive pairs of tests, guaranteeing to detect all OD tests that depend on one other test, while running substantially fewer orders and tests than simply running all test pairs.

**Keywords:** Flaky tests · Order dependent · Test-pair coverage

### **1 Introduction**

The most common way that developers check their software is through frequent regression testing performed while they develop software. Developers run regression tests to check that recent code changes do not break existing functionality. A major problem for regression testing is flaky tests [27], which can nondeterministically pass or fail when run on the same code version. The failures from

Tao Xie is with the Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, China, and is the corresponding author.

<sup>©</sup> The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 270–287, 2021. https://doi.org/10.1007/978-3-030-72016-2 15

these tests can mislead developers to debug their recent changes while the failures can be due to a variety of reasons unrelated to the changes. Many software organizations have reported flaky tests as one of their biggest problems in software development, including Apple [18], Facebook [5,10], Google [8,30,31,43,48], Huawei [16], Microsoft [11,12,20,21], and Mozilla [40].

These flaky tests are among the tests, called test suite, that developers run during regression testing; a test suite is most often specified as a set, not a sequence, of tests. Having a test suite as a set provides benefits for regression testing techniques such as selection, prioritization, and parallelization [23,45]. The test execution platform can choose to run these tests in various test orders. For example, for projects using Java, the most popular testing framework is JUnit [17], and the most popular build system is Maven [28]. Tests in JUnit are organized in a set of test classes, each of which has a set of test methods. By default, Maven runs tests using the Surefire plugin [29], which does not guarantee any order of test classes or test methods. However, the use of Surefire and JUnit does not interleave the test methods from different test classes in a test order. The same structure is common for many other testing frameworks such as TestNG [41], Cucumber [4], and Spock [38].

One prominent category of flaky tests is deterministic order-dependent (OD) tests [22,24,32,47], which can deterministically pass or fail in various test orders, with at least one order in which these tests pass and at least one other order in which they fail. Other flaky tests are non-deterministic (ND) tests, which are flaky due to reasons other than solely the test order [24]; for at least one test order, these tests can nondeterministically pass or fail even in that same test order. Our iDFlakies work [22] has released the iDFlakies dataset [15] of flaky tests in open-source Java projects. We obtained this dataset by running test suites many times in randomized test orders, collecting test failures, and classifying failed tests as OD or ND flaky tests. In total, 50.5% of the dataset are OD tests, while the remaining 49.5% are ND tests.

Prior research has proposed multiple tools [2,6,9,14,22,47] to detect OD tests. Some of the tools [9,14] search for potential OD tests and may therefore report false alarms, i.e., tests that cannot fail in the current test suite (but may fail in some extended test suite). The other tools [2,6,22,47] detect OD tests that actually fail by running multiple randomized orders of the test suite. Running tests in random orders is also available in many testing platforms, e.g., Surefire for Java has a mode to randomize the order of test classes, pytest [35] for Python has the --random-order option, and rspec [36] for Ruby has the --order random option. While these tools can detect many OD tests, the tools run random orders and hence can miss running test orders in which OD tests would fail. The listed prior work has not studied the flake rates, i.e., the probability that an OD test would fail when run in (uniformly) sampled test orders.

Our iFixFlakies work [37] has studied the causes of failures for OD tests. We find that the vast majority of OD tests are related to pairs of tests, i.e., each OD test would pass or fail due to the sharing of some global state with just one other test. Our iFixFlakies work has also defined multiple kinds of tests related

to OD tests. Each OD test belongs to one of two kinds: (1) brittle, which is a test that fails when run by itself but passes in a test order where the test is preceded by a state-setter ; and (2) victim, which is a test that passes when run by itself but fails in a test order where the test is preceded by a (state-)polluter unless a (state-)cleaner runs in between the polluter and the victim. Most of the work in this paper focuses on victim tests because most OD tests are victims rather than brittles (e.g., 91% of the truly OD tests in the iDFlakies dataset are victims [15]), and the analysis for brittles often follows as a simple special case of the analysis for victims.

This paper makes the following two main contributions.

**Probability Analysis.** We develop a methodology to analytically obtain the flake rates of OD tests and propose a simple change to the random sampling of test orders to increase the probability of detecting OD tests. A flake rate is defined as the ratio of the number of test orders in which an OD test fails divided by the total number of orders. Flake rates can help researchers analytically compare various algorithms (e.g., comparing reversing a passing order to sampling a random order as shown in Section 4.4) and help practitioners prioritize the fixing of flaky tests. Specifically, we study the following problem: determine the flake rate for a given victim test with its set of polluters and a set of cleaners for each polluter. We first derive simple formulas with two main assumptions: (A1) all polluters have the same set of cleaners and (A2) all of the victim, polluters, and cleaners are in the same test class. We then derive formulas that keep A1 but relax A2. Our results on 249 real flaky tests show that our formulas are applicable to 236 tests (i.e., only 13 tests violate A1). To relax both assumptions, we propose an approach to estimate the flake rate without running test orders. Our analysis finds that some OD tests have a rather low flake rate, as low as 1.2%.

**Systematic Test-Pair Exploration.** Because random sampling of test orders may miss test orders in which OD tests fail, we propose a systematic approach to cover all consecutive test pairs to detect OD tests. We present an algorithm that systematically explores all consecutive test pairs, guaranteeing the detection of all OD tests that depend on one other test, while running substantially fewer tests than a naive exploration that runs every pair by itself. Our algorithm builds on the concept of Tuscan squares [7], studied in the field of combinatorics. Given a test suite, the algorithm generates a set of test orders, each consisting of at least two distinct tests and at most all of the tests from the test suite, that cover all of the consecutive test pairs, while trying to minimize the cost of running those test orders. The algorithm can cover pairs of tests from the same and different classes, while considering only the test orders that do not interleave tests from different test classes, being a common constraint of testing frameworks such as JUnit [17]. Our analysis shows that the algorithm runs substantially fewer tests than naive exploration. To experiment with the new algorithm based on Tuscan squares, we run some of the test orders generated by the algorithm for some of the test suites in the iDFlakies dataset. Our experiments detect 44 new OD tests, not detected in prior work [22,24,25], and we have added the newly detected tests to the Illinois Dataset of Flaky Tests [19].

```
1 public void testMRAppMasterSuccessLock() { // testV for short
2 ... // setup MapReduce job, e.g., set conf and userName
3 MRAppMaster appMaster =
4 new MRAppMasterTest("appattempt_...", "container_...", "host", -1,
5 -1, System.currentTimeMillis(), false, false);
6 try {
7 MRAppMaster.initAndStartAppMaster(appMaster, conf, userName);
8 } catch (IOException e) { ... }
9 ... // assert the state and some properties of appMaster
10 appMaster.stop();
11 }
```

```
1 public void testSigTermedFunctionality() { // testP for short
2 JHEventHandlerForSigtermTest jheh =
3 new JHEventHandlerForSigtermTest(Mockito.mock(AppContext.class), 0);
4 jheh.addToFileMap(Mockito.mock(JobId.class));
5 ... // have jheh handle a few events
6 jheh.stop();
7 ... // assert whether the events were handled properly
8 }
```
**Fig. 2.** Polluter test from Hadoop's TestJobHistoryEventHandler class.

### **2 Background and Example**

We use an example to introduce some key concepts for OD tests and to illustrate challenges in debugging these tests. We represent a test order as a sequence of tests t1, t2,...,tl. In Java, each test order is executed by a Java Virtual Machine (JVM) that starts from the initial state (e.g., all shared pointer variables initialized to null) and then runs each test, which potentially modifies the shared state. Each test is run at most once in one JVM run. (Thus, covering test orders and test pairs has to be done with a set of test orders and cannot be done with just one very long order, e.g., using superpermutations [13].) A test v is a victim if it passes in the order v but fails in another order; the other order usually contains a single polluter test p (besides many other tests) such that v fails even in the order p, v. Moreover, the test suite may contain a cleaner test <sup>c</sup> such that <sup>v</sup> passes in the order p, c, v. Note that test orders may contain more tests besides polluters and cleaners for a victim v, but these other tests do not modify the relevant state and do not affect whether v passes or not in any order. Precise definitions for these tests are in our previous work [37].

Figure 1 shows a snippet of a victim test, testMRAppMasterSuccessLock (in short testV), from the widely used Hadoop project [1]. The test suite for this test has 392 tests. This test is from the MapReduce (MR) framework and aims to check an MR application. This test is a victim because it passes when run by itself but has two polluter tests. If the victim is run after either one of its polluter tests (and no cleaner runs in between the polluter and the victim), then the victim fails with a NullPointerException. Figure 2 shows a snippet of one of these two polluter tests, testSigTermedFunctionality (in short testP).

These tests form a polluter-victim pair because they share a global state, namely all "active" jobs stored in a static map in the JobHistoryEventHandler class. (In JUnit 4, only the heap state reachable from the class fields declared as static is shared across tests; JUnit does not automatically reset that state, but developers can add setup and teardown methods to reset the state.) To check an MR application, testV first sets up some state (Line 2), then creates an MR application (Line 3), and starts the application (Line 7). The NullPointerException arises when the test tries to stop the MR application (Line 10). Specifically, the appMaster accesses the shared map data structure that tracks all jobs run by any application. When testV is run after testP, then appMaster will attempt to stop a job created by the polluter, although the job has already been stopped.

This static map is empty when the JVM starts running a test order, and it is also explicitly cleared by some tests. In fact, we find 11 cleaner tests that clear the map, and the victim passes when any one of these 11 tests is run between testP and testV. Interestingly, for the other polluter test, testTimelineEventHandling (in short testP'), the victim fails for the same reason, but testP' has 31 cleaners—the same 11 as testP and 20 other cleaners. Our manual inspection finds that the testP' polluter has other cleaners because the job created by testP' is named job 200 0001, while the job created by the testP polluter is a mock object. The 20 other cleaners also create and stop jobs named job 200 0001 and therefore act as cleaners for the testP' polluter but not the testP polluter. This example illustrates not only how victims and polluters work but also the complexity in how these tests interact with cleaners.

In Section 4.2, we explore how to compute the flake rate for a victim test, i.e., the probability that the test fails in a randomly sampled test order of all tests in the test suite. For this example, the 392 tests could, in theory, be run in 392! (<sup>∼</sup> <sup>10</sup>848) test orders (permutations), but in practice, JUnit never interleaves test methods from different test classes. These tests are split into 48 classes that actually have <sup>∼</sup> <sup>10</sup><sup>234</sup> test orders that JUnit could run. The relevant 34 tests (1 victim, 2 polluters, and 31 cleaners) belong to 8 test classes: 2 polluters belong to one class (TestJobHistoryEventHandler), 11 cleaners belong to the same class as the polluters, 1 cleaner belongs to the same class as the victim (TestMRAppMaster), and the remaining 19 cleaners belong to six other classes. For this victim, randomly sampling the orders that JUnit could run gives a flake rate of 4.5%. In Section 4.4, we propose a simple change to increase the probability of detecting OD tests by running a reverse of each passing test order. For this victim, the conditional probability that the reverse order fails is 4.9%.

A commonly asked question is whether all detected OD tests should be fixed. While ideally all flaky tests should be fixed, some are not fixed [21,23]. For the majority of OD tests, fixing them is good to prevent flaky-test failures that can mislead the developers into debugging the wrong parts of the code; also, fixing OD tests enables tests to be run in any order, which then enables the use of beneficial regression-testing techniques [23]. Some OD tests are intentionally run in specific orders (e.g., using the @FixMethodOrder annotation in JUnit) to speed up testing by reusing states. We have submitted fixes for a large number of flaky tests in our prior work [19].

### **3 Preliminaries**

We next formalize the concepts that we have introduced informally and define some new concepts. Let <sup>T</sup> <sup>=</sup> {t1, t2,...,tn} be a set of <sup>n</sup> tests partitioned in <sup>k</sup> classes <sup>C</sup> <sup>=</sup> {C1, C2,...,Ck}. We use class(t) to denote the class of test <sup>t</sup>. Each class <sup>C</sup><sup>i</sup> has <sup>n</sup><sup>i</sup> <sup>=</sup> |{<sup>t</sup> <sup>∈</sup> <sup>T</sup> <sup>|</sup> class(t) = <sup>C</sup>i}| tests.

We use ω(T ) to denote a test order, i.e., a permutation of tests in <sup>T</sup> <sup>⊆</sup> <sup>T</sup>, and drop T when clear from the context. We use ω<sup>i</sup> to denote the i-th test in the test order <sup>ω</sup>, and <sup>|</sup>ω<sup>|</sup> to denote the length of a test order as measured by the number of tests. We use <sup>t</sup> <sup>≺</sup><sup>ω</sup> <sup>t</sup> to denote that test t is before t in the test order ω. We will analyze some cases that allow all n! permutations, potentially interleaving tests from different classes. We use ΩA(T) to denote the set of all test orders for T. Some testing tools [47] explore all these test orders, potentially generating false alarms because most testing frameworks [4,17,38,41] do not allow all these test orders.

We are primarily concerned with class-compatible test orders where all tests from each class are consecutive, i.e., if class(ωi) = class(ω<sup>i</sup>- ), then for all j with i<j<i , class(ωi) = class(ω<sup>j</sup> ). We use Ω<sup>C</sup> (T) to denote the set of all class-compatible test orders for T. The number of such class-compatible test orders is k! <sup>k</sup> <sup>i</sup>=1 <sup>n</sup>i!. Section 4.2 presents how to compute the flake rate, i.e., the percentage of test orders in which a given victim test (with its polluters and cleaners) fails.

Section 5 presents how to systematically generate test orders to ensure that all test pairs are covered. A test pair t, t consists of two distinct tests <sup>t</sup> <sup>=</sup> <sup>t</sup> . We say that a test order <sup>ω</sup> covers a test pair t, t , in notation cover(ω,t, t ), iff the two tests are consecutive in <sup>ω</sup>, i.e., <sup>ω</sup> <sup>=</sup> . . . , t, t ,.... Considering consecutive tests is important because a victim may not fail if not run right after a polluter, i.e., when a cleaner is run between the polluter and the victim. A set of test orders <sup>Ω</sup> covers the union of test pairs covered by each test order <sup>ω</sup> <sup>∈</sup> <sup>Ω</sup>. In general, test orders in a set can be of different lengths. Each test order ω covers <sup>|</sup>ω| − 1 test pairs.

We distinguish intra-class test pairs, where class(t) = class(t ), and interclass test pairs, where class(t) = class(<sup>t</sup> ). Of the total <sup>n</sup>(<sup>n</sup> <sup>−</sup> 1) test pairs, each class <sup>C</sup><sup>i</sup> has <sup>n</sup>i(n<sup>i</sup> <sup>−</sup> 1) intra-class test pairs, and the number of inter-class test pairs is 2 <sup>1</sup>≤i<j≤<sup>k</sup> <sup>n</sup>in<sup>j</sup> . Each class-compatible test order of all <sup>T</sup> tests covers <sup>n</sup><sup>i</sup> <sup>−</sup> 1 intra-class test pairs for each class <sup>C</sup><sup>i</sup> and <sup>k</sup> <sup>−</sup> 1 inter-class test pairs.

We aim to generate a set of test orders Ω that cover all test pairs<sup>3</sup>. If we consider ΩA(T) that allows all test orders, we need at least n test orders to cover all <sup>n</sup>(n−1) test pairs. When we have only one class or all classes have only one test, then all test orders are class-compatible. However, consider the more common case when we have more than one class and some class has more than one test. If we consider Ω<sup>C</sup> (T) that allows only class-compatible test orders, we need at least max<sup>k</sup> <sup>i</sup>=1 n<sup>i</sup> test orders to cover all intra-class test pairs and at

<sup>3</sup> This problem should not be confused with *pairwise testing* [33], which typically aims to cover pairs of values from different test parameters.

least M = 2 <sup>1</sup>≤i<j≤<sup>k</sup> <sup>n</sup>inj/(<sup>k</sup> <sup>−</sup> 1) test orders to cover all inter-class test pairs; because M > max<sup>k</sup> <sup>i</sup>=1 ni, we need at least M class-compatible test orders to cover all test pairs.

More precisely, we aim to generate a set of test orders Ω that has the lowest cost for test execution. The cost for each test order ω can be modeled well as a sum of a fixed cost Cost<sup>0</sup> (e.g., corresponding to the time required to start a JVM and load required classes) and a cost for each test (e.g., the time to execute the test method): Cost(ω) = Cost<sup>0</sup> + <sup>t</sup>∈<sup>ω</sup> Cost(t). The cost for a set of test orders is then simply the sum of individual costs Cost(Ω) = <sup>ω</sup>∈<sup>Ω</sup> Cost(ω). For example, a trivial way to cover all test pairs is with a set of test orders where each test order is just a test pair: <sup>Ω</sup><sup>p</sup> <sup>=</sup> {t, t |t, t <sup>∈</sup> <sup>T</sup> <sup>∧</sup> <sup>t</sup> <sup>=</sup> <sup>t</sup> }; however, the cost is unnecessarily high: Cost(Ωp) = <sup>n</sup>(<sup>n</sup> <sup>−</sup> 1)Cost<sup>0</sup> + 2(<sup>n</sup> <sup>−</sup> 1)Cost(T), where Cost(T) = <sup>t</sup>∈<sup>T</sup> Cost(t).

To simplify, we can assume that each test in T has the same cost, say, Cost1, and then Cost(Ωp) = <sup>n</sup>(<sup>n</sup> <sup>−</sup> 1)Cost<sup>0</sup> + 2n(<sup>n</sup> <sup>−</sup> 1)Cost1. In the optimal case, each test order would be a permutation of <sup>n</sup> tests covering <sup>n</sup> <sup>−</sup> 1 test pairs, and the number of test orders would be just <sup>n</sup>(<sup>n</sup> <sup>−</sup> 1)/(<sup>n</sup> <sup>−</sup> 1) = <sup>n</sup>. Therefore, the lowest cost is Cost(Ωopt) = nCost<sup>0</sup> + n2Cost1, demonstrating that the factor for Cost<sup>0</sup> can be substantially reduced, while the factor for Cost<sup>1</sup> is nearly halved ( <sup>n</sup> 2(n−1) ). However, in most realistic cases, due to the constraints of class-compatible test orders and the big differences in the number of tests across different classes, we cannot reach the optimal case.

#### **3.1 Dataset for Evaluation**

Besides deriving some analytical results, we also run some empirical experiments on flaky tests from Java projects. Our recent work [25] ran the iDFlakies tool on most test suites in the projects from the iDFlakies dataset [15] using the configurations recommended by our iDFlakies work [22]. Specifically, we ran 100 randomly sampled test orders from Ω<sup>C</sup> (T) and 1 test order that is the reverse order of what Maven Surefire [29] runs by default. Note that unlike our work in Section 4.4, where we propose running a reverse test order of every test order where all tests passed, the one reverse order that we ran in our recent work [25] may or may not have been from a passing test order, and the reverse order is run only once and not for every passing test order.

Each project in the iDFlakies dataset is a Maven-based, Java project organized into one or more modules, which are (sub)directories that organize code under test and test code. Each module contains its own test suite. For the remainder of the paper, we use the 121 modules in which our recent work [25] found at least one flaky test (but not necessarily OD test). To illustrate diversity among these 121 modules, the number of classes ranges from 1 to 2215, with an average of 61, and the total number of tests ranges from 1 to 4781, with an average of 287. The number of tests per class ranges from 1 to 200, with an average of 4.8.

When we run some of the test orders generated by our systematic test-pair exploration as described in Section 5.2, we detect a total of 249 OD tests in 44 of the 121 modules. Of the 249 OD tests, 57 are brittles and 192 are victims. Compared to the OD tests detected in our prior work [22,24,25] that used the iDFlakies dataset, we find 44 new OD tests that have not been detected before. Of the 44 OD tests, 1 is brittle and 43 are victims. One of the newly detected victim tests (testMRAppMasterSuccessLock) is shown in Section 2.

### **4 Analysis of Flake Rate and Simple Algorithm Change**

We next discuss how to compute the flake rate for each OD test. Let T be a test suite with an OD test. Prior work [22,24,25,47] would run many test orders of T and compute the flake rate for each test as a ratio of the number of test failures and the number of test runs. However, failures of flaky tests are probabilistic, and running even many test orders may not suffice to obtain the true flake rate for each test. Running more test orders is rather costly in machine time; in the limit, we may need to run all <sup>|</sup>T|! permutations to obtain the true flake rate for OD tests. To reduce machine time needed for computing the flake rate for OD tests, we first propose a new procedure, and then derive formulas based on this procedure. We finally show a simple change for sampling random test orders to increase the probability of detecting OD tests.

#### **4.1 Determining Test Outcome without Running a Test Order**

We use a two-step procedure to determine the test outcome for a given OD test. We assume that some prior runs already detected the OD test, and the goal is to determine the test outcome for some new test orders that were not run.

In Step 1, we classify how each test from T relates to each OD test in a simple setting that runs only up to three tests. Specifically, we first determine whether an OD test t is a victim or a brittle by running the test in isolation, i.e., just t, by itself 10 times: if <sup>t</sup> always passes, it is considered a victim (although it may be an ND test); if t always fails, it is considered a brittle (although it may be an ND test); and if t sometimes passes and sometimes fails, it is definitely an ND, not OD, test. This approach was proposed for iFixFlakies [37], and using 10 runs is a common industry practice to check whether a test is flaky [31,40].

We then find (1) for each victim, all its single polluters in T and also all single cleaners for each polluter, and (2) for each brittle, all its single statesetters in T. To find polluters (resp. state-setters) of a victim (resp. brittle) test, iFixFlakies [37] takes as input a test order (of entire T) where the test failed (resp. passed) and then searches the prefix of the test in that test order using delta debugging [46] (an extended version of binary search). While iFixFlakies can find all polluters (resp. state-setters) in the prefix, it does not necessarily find all polluters in T, and it takes substantial time to find these polluters using delta debugging. The experiments show that in 98% of cases, binary search finds one test to be a polluter, although some rare cases need a polluter group that consists of two tests.

We propose a simpler and faster approach to find polluters (resp. statesetters) for the most common case: for each victim v (resp. brittle b) and each test <sup>t</sup> <sup>∈</sup> <sup>T</sup> \ {v} (resp. <sup>t</sup> <sup>∈</sup> <sup>T</sup> \ {b}), we run a pair of the test and the victim (resp. brittle), i.e., t, v (resp. t, b). If the victim fails (resp. brittle passes), then the test t is a polluter (resp. state-setter). Further, for each victim v, its polluter <sup>p</sup>, and a test <sup>t</sup> <sup>∈</sup> <sup>T</sup> \ {v, p}, we run a triple of p, t, v, and if <sup>v</sup> passes, then <sup>t</sup> is a cleaner for the pair of v and p. Note that for the same victim v, different polluters may have different cleaners such as the example presented in Section 2.

In Step 2, we determine whether each OD test passes or fails in a given test order using only the abstraction from Step 1, without actually running the test order. We focus on victims because they are more complex than brittles; brittles can be viewed as special cases with slight changes (requiring a state-setter to run before a brittle to pass, rather than requiring a polluter not to run before a victim to pass). Without loss of generality, we consider one victim at a time. Intuitively, the victim fails in a test order if a polluter is run before the victim without a cleaner between the polluter and the victim. Formally, we define the test outcome as follows.

**Definition 1 (Test Outcome from Abstraction).** Let T be a test suite with one victim <sup>v</sup> <sup>∈</sup> <sup>T</sup>, polluters <sup>P</sup> <sup>⊂</sup> <sup>T</sup>, and a family of cleaners <sup>C</sup><sup>p</sup> <sup>⊂</sup> <sup>T</sup> indexed by each polluter <sup>p</sup> <sup>∈</sup> <sup>P</sup>. The outcome of <sup>v</sup> in a test order <sup>ω</sup> is defined as follows:

fail(ω) ≡ ∃<sup>p</sup> <sup>∈</sup> P. p <sup>≺</sup><sup>ω</sup> <sup>v</sup>∧ ∃<sup>c</sup> <sup>∈</sup> <sup>C</sup>p. p <sup>≺</sup><sup>ω</sup> <sup>c</sup> <sup>∧</sup> <sup>c</sup> <sup>≺</sup><sup>ω</sup> <sup>v</sup>; pass(ω) ≡ ¬fail(ω).

This definition is an estimate of what one would obtain for all (repeated) runs of <sup>|</sup>T|! permutations, for three main reasons: (1) tests may behave differently in test orders than in isolation [24] (and an OD test may even be an ND test in some orders [24]); (2) polluters, cleaners, and state-setters may not be single tests but groups (iFixFlakies [37] reports that groups are rather rare); and (3) a test that fails in some prefix may behave differently for the tests that come after it in a test order than when the test passes (again, iFixFlakies [37] reports this issue to be rare, finding just one such case). Despite these potential sources of error, our evaluation shows that our use of abstraction obtains flake rates similar to iDFlakies for orders that iDFlakies ran. Most importantly, our use of abstraction allows us to evaluate many more orders without actually running them, thus taking much less machine time.

### **4.2 Computing Flake Rate**

We next define flake rate, derive formulas for computing flake rate for two cases, and show why we need to sample test orders for other cases.

**Definition 2 (Flake Rate).** For a test suite T with exactly one victim, given a set of test orders Ω(T), the flake rate is defined as the ratio:

$$f(T) = |\{\omega \in \Omega(T) \mid \text{fail}(\omega)\}| \;/\; |\Omega(T)|;$$

we use the subscript f<sup>A</sup> and f<sup>C</sup> when we need to refer specifically to the flake rate for ΩA(T) and Ω<sup>C</sup> (T) (defined in Section 3), respectively.

We derive the formula for flake rate based on the number of polluters P and cleaners C for two special cases. In general, computing the flake rate can ignore tests that are not relevant, i.e., not in {v}∪<sup>P</sup> <sup>∪</sup> <sup>p</sup>∈<sup>P</sup> <sup>C</sup>p. It is easy to prove that f(T) = f(T ) if T and T have the same victim, polluters, and cleaners—the reason is that the tests from <sup>T</sup> \ <sup>T</sup> are irrelevant in any order and do not affect the outcome of v; we omit the proof due to space limit. The further analysis thus focuses only on the relevant tests.

**Special Case 1:** Assume that (A1) all polluters have the same set C of cleaners: <sup>C</sup> <sup>=</sup> <sup>C</sup>p, <sup>∀</sup><sup>p</sup> <sup>∈</sup> <sup>P</sup>; and (A2) all of the victim, polluters, and cleaners are in the same class: <sup>∀</sup>t, t ∈ {v}∪<sup>P</sup> <sup>∪</sup>C.class(t) = class(t ); it means that ΩA(T) = Ω<sup>C</sup> (T) and <sup>f</sup><sup>A</sup> <sup>=</sup> <sup>f</sup><sup>C</sup> . Let <sup>π</sup> <sup>=</sup> <sup>|</sup>P<sup>|</sup> and <sup>γ</sup> <sup>=</sup> <sup>|</sup>C|. The total number of permutations of the relevant tests is (<sup>π</sup> <sup>+</sup> <sup>γ</sup> + 1)!. While we can obtain |{<sup>ω</sup> <sup>∈</sup> <sup>Ω</sup>(T)<sup>|</sup> fail(ω)}| purely by definition, counting test orders where the victim fails, we prefer to take a probabilistic approach that will simplify further proofs. A victim fails if (1) it is not in the first position, with probability (π + γ)/(π + γ + 1), and (2) its immediate predecessor is a polluter, with probability π/(π + γ), giving the overall flake rate f(T) = π/(π + γ + 1). This formula is simple, but real test suites often violate A1 or A2. Of the 249 tests used in our experiments, 13 violate both A1 and A2, 207 violate only A2, and only 29 do not violate either. **Special Case 2:** Keeping A1 but relaxing A2, assume that the victim is in class <sup>C</sup><sup>1</sup> with <sup>π</sup><sup>1</sup> polluters and <sup>γ</sup><sup>1</sup> cleaners, and the other <sup>k</sup>−1 classes have <sup>π</sup><sup>i</sup> polluters and <sup>γ</sup><sup>i</sup> cleaners, 2 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, where in general, either <sup>π</sup><sup>i</sup> or <sup>γ</sup>i, but not both, can be zero for any class except for the victim's own class where both π<sup>1</sup> and γ<sup>1</sup> can be zero. Per Special Case 1, we have fA(T)=(<sup>k</sup> <sup>i</sup>=1 <sup>π</sup>i)/( <sup>k</sup> <sup>i</sup>=1 <sup>π</sup><sup>i</sup> <sup>+</sup> <sup>k</sup> <sup>i</sup>=1 <sup>γ</sup><sup>i</sup> + 1). Next, consider class-compatible test orders, which do not interleave tests from different classes. The victim fails if (1) it fails in its own class, with probability π1/(π<sup>1</sup> + γ<sup>1</sup> + 1), or (2) the following three conditions hold: (2.1) the victim is the first in its own class, with probability 1/(π<sup>1</sup> + γ<sup>1</sup> + 1), (2.2) the class is not the first among classes, with probability (<sup>k</sup> <sup>−</sup> 1)/k, and (2.3) the immediately preceding class ends with a polluter, with probability πi/(π<sup>i</sup> + γi) for each class i and thus the probability <sup>k</sup> <sup>i</sup>=2(πi/(π<sup>i</sup> <sup>+</sup> <sup>γ</sup>i))/(<sup>k</sup> <sup>−</sup> 1) across all classes. Overall,

$$f\_C(T) = \frac{\pi\_1 + \frac{1}{k} \sum\_{i=2}^k \frac{\pi\_i}{\pi\_i + \gamma\_i}}{\pi\_1 + \gamma\_1 + 1}.$$

The formula is already more complex. It is important to note that we can have either <sup>f</sup>A(T) <sup>≥</sup> <sup>f</sup><sup>C</sup> (T) or <sup>f</sup><sup>C</sup> (T) <sup>≥</sup> <sup>f</sup>A(T), based on the ratio of polluters and cleaners in the victim's own class vs. the ratio of polluters and victims in other classes, i.e., neither set of test orders ensures a higher flake rate. We show in Section 4.3 that both cases arise in practice.

**General Case:** In the most general case, relaxing A1 to allow different polluters to have a different set of cleaners, while also having all these relevant tests in different classes, it appears challenging to derive a closed-form expression for fA(T), let alone for f<sup>C</sup> (T). We thus resort to estimating flake rates by sampling orders from ΩA(T) or Ω<sup>C</sup> (T), and counting what ratio of them fail based on Definition 1 in Section 4.3.

**Fig. 3.** Distribution of flake rate for two sets of test orders.

#### **4.3 Comparing Flake Rate for Different Sets of Test Orders**

While tools such as iDFlakies [22] incorporate the requirement of not interleaving tests from different classes in a test order, some other tools [47] do not incorporate this requirement, so they allow all test orders. Recall that ΩA(T) denotes the set of all test orders and Ω<sup>C</sup> (T) denotes the set of test orders that satisfy the requirement. The reason to run ΩA(T) is to try to maximize the detection of all potential OD tests at the risk that some detected failures would be false positives. In particular, a test failure observed in some non-class-compatible order may not be reproducible in any class-compatible prefix of that order, e.g., due to the various ways to customize JUnit [17] (with annotations such as @Before, @BeforeClass, @Rule) or similar testing frameworks. The reason to run only Ω<sup>C</sup> (T) is to detect OD-test failures that developers can observe from running the tests and are therefore motivated to fix.

While both sets of test orders can detect all true positive OD tests, it is not clear which set of test orders are more likely to detect true positive OD tests. Intuitively, running ΩA(T) test orders can more likely detect failures if cleaners and victims are in the same class, while polluters are in different classes; in such cases, polluters are less likely to come in between cleaners and the victim. For example, for the victim presented in Section 2, the ΩA(T) flake rate is 10.5%, while the Ω<sup>C</sup> (T) flake rate is 4.5%. On the other hand, running Ω<sup>C</sup> (T) test orders can more likely detect failures if polluters and victims are in the same class, while cleaners are in different classes. Similar reasoning applies to brittles: if state-setters are more often in the same test class as the brittle, then the brittle is less likely to fail than if state-setters are more often in other classes.

To compare these sets of test orders on real OD tests, we use the dataset of 192 victim and 57 brittle tests described in Section 3.1. We collect all single test polluters for each victim and all single test cleaners for each polluter-victim pair. We also collect all single test state-setters for the brittles. We then use either the formulas presented in Section 4.2 or a large number of uniformly sampled test orders to obtain the flake rates, fA(T) and f<sup>C</sup> (T), for each test. Specifically, our formulas apply for 236 of the 249 tests. For the remaining 13 tests (all victims), we sample 100,000 test orders from each of ΩA(T) and Ω<sup>C</sup> (T) to estimate their flake rates.

Figure 3 summarizes the results. For each set of test orders, the figure shows a boxplot that visualizes the distribution of flake rates for 249 OD tests. The fA(T) flake rates have a slightly higher mean (38.4%) than the f<sup>C</sup> (T) flake rates (38.0%). Statistical tests for paired samples of the flake rates—specifically, dependent Student's t-test obtains a p-value of 0.47 and Wilcoxon signed-rank test obtains a p-value of 0.01—show that the differences could be statistically significant (at α = 0.05 level). However, if we omit the 13 tests that required samplings, the means are 38.3% for fA(T) and 38.6% for f<sup>C</sup> (T), and the difference is not statistically significant (dependent Student's t-test obtains a p-value of 0.55, and Wilcoxon signed-rank test obtains a p-value of 0.19).

Prior work [6,22,24,47] has not performed any explicit comparison between the two sets of test orders. Our results demonstrate that running ΩA(T) might be more likely to detect true positive OD tests. However, using such test orders may contain false positives. Future work on detecting OD tests should explore how to address false positives if ΩA(T) test orders are run.

#### **4.4 Simple Change to Increase Probability of Detecting OD Tests**

Inspired by our probability analysis, we propose a simple change to increase the probability of detecting OD tests. The standard algorithm for sampling S random test orders simply repeats <sup>S</sup> times the following steps: (1) <sup>ω</sup> <sup>←</sup> sample a random test order from possible test orders (ΩA(T) or Ω<sup>C</sup> (T)); (2) obtain result <sup>r</sup> <sup>←</sup> run(ω); (3) if <sup>r</sup> is FAIL, then print <sup>ω</sup>. (A variant [22] may store previously sampled test orders to avoid repetition, but the number of possible test orders is usually so large that sampling the same one is highly unlikely, so one can save space and time by not tracking previously sampled test orders.)

Our key change is to select the next test order as a reverse of the prior test order that passed: (4) if <sup>r</sup> is PASS, then <sup>ω</sup><sup>R</sup> <sup>←</sup> reverse(ω). The intuition for this change is that a passing order may have the polluter after the victim. Therefore, reversing the passing order would have the polluter before the victim, and thus the reverse of the passing order should have a higher probability to fail than a random order that may have the polluter before or after the victim. Note that the reverse of a class-compatible test order is also a class-compatible test order, so this change applies to Ω<sup>C</sup> (T). The other changes are to run ωR, print if it fails, and properly count the test orders to select exactly S samples of test orders.

We next compute the probability that the reverse of a passing order fails. **Special Case 1:** Consider the Special Case 1 scenario from Section 4.2 with π polluters and γ cleaners. For the standard algorithm, f(T) = fA(T) = f<sup>C</sup> (T) = π/(π + γ + 1). For our change, the conditional probability that the second test order fails given that the first test order passes is <sup>P</sup>(fail(ωR)|pass(ω)) = <sup>P</sup>(fail(ωR) <sup>∧</sup> pass(ω))/P(pass(ω)). We already have <sup>P</sup>(pass(ω)) = 1 <sup>−</sup> <sup>f</sup>(T) = (γ + 1)/(π + γ + 1).

To compute <sup>P</sup>(fail(ωR)∧pass(ω)), we consider two cases based on the position of the victim in the passing test order ω. (1) If the victim is first, with the probability of 1/(π + γ + 1), then the second test should be a polluter, with the probability of π/(π +γ), so we get π/((π +γ)(π +γ + 1)) for this case. (2) If the victim is not first, it cannot be the last in ω because otherwise, ω<sup>R</sup> would not fail, so the victim is in the middle, with the probability of (π+<sup>γ</sup> <sup>−</sup>1)/(π+<sup>γ</sup> + 1). We also need a cleaner right before the victim, with probability γ/(π + γ), and a polluter right after the victim, with probability π/(<sup>π</sup> <sup>+</sup> <sup>γ</sup> <sup>−</sup> 1). Overall, we get the probability πγ/((π + γ)(π + γ + 1)) for this case. We can sum up the two cases to get <sup>P</sup>(fail(ωR) <sup>∧</sup> pass(ω)) = <sup>π</sup>(<sup>γ</sup> + 1)/((<sup>π</sup> <sup>+</sup> <sup>γ</sup>)(<sup>π</sup> <sup>+</sup> <sup>γ</sup> + 1)).

Finally, the conditional probability that the reverse test order fails given the first test order passes is <sup>P</sup>(fail(ωR)|pass(ω)) = ( <sup>π</sup>(γ+1) (π+γ)(π+γ+1) )/( <sup>γ</sup>+1 <sup>π</sup>+γ+1 ) = π/(π + γ). This probability is strictly larger than f(T) = π/(π + γ + 1), because π > 0 must be true for the victim to be a victim.

**Special Case 2:** For the Special Case 2 scenario from Section 4.2, the common case is π<sup>1</sup> + γ<sup>1</sup> > 0 (i.e., the victim's class C<sup>1</sup> has at least one other relevant test). Based on the relative position of the victim in class C1, we consider three cases: the victim runs first, in the middle, or last in class C1. After calculating the probability for the three cases separately and summing them up, we get the probability that the reverse test order fails and the first test order passes as <sup>P</sup>(fail(ωR) <sup>∧</sup> pass(ω)) = <sup>π</sup>1+kπ1γ1+π1Sγ+γ1(π1+γ1+1)S<sup>π</sup> <sup>k</sup>(π1+γ1)(π1+γ1+1) where <sup>S</sup><sup>π</sup> <sup>=</sup> <sup>k</sup> i=2 πi πi+γ<sup>i</sup> and S<sup>γ</sup> = <sup>k</sup> i=2 γi πi+γ<sup>i</sup> . In Section 4.2, we have computed P(pass(ω)), so dividing <sup>P</sup>(fail(ωR) <sup>∧</sup> pass(ω)) by <sup>P</sup>(pass(ω)) gives the conditional probability that the reverse test order fails given the first test order passes. Due to the complexity of the formulas, it is difficult to show a detailed proof that <sup>P</sup>(fail(ωR)|pass(ω)) <sup>&</sup>gt; f(T), so we sample test orders instead.

When we sample both ΩA(T) and Ω<sup>C</sup> (T) for 100,000 random test orders on all 249 OD tests without reverse (i.e., the standard algorithm) and with reverse when a test order passes (i.e., our change), we find that our change does statistically significantly increase the chance to detect OD tests. Specifically, for ΩA(T), test orders without reverse obtain a mean of 38.6%, while test orders with reverse of passing test orders obtain a mean of 45.3%. Statistical tests for paired samples on the flake rates without and with reverse for ΩA(T) show a <sup>p</sup>-value of <sup>∼</sup> <sup>10</sup>−<sup>38</sup> for dependent Student's t-test and a <sup>p</sup>-value of <sup>∼</sup> <sup>10</sup>−<sup>43</sup> for Wilcoxon signed-rank test. Similarly, for Ω<sup>C</sup> (T), test orders without reverse obtain a mean of 38.0%, while test orders with reverse of passing test orders obtain a mean of 45.3%. Statistical tests for paired samples on the flake rates without and with reverse for <sup>Ω</sup><sup>C</sup> (T) show a <sup>p</sup>-value of <sup>∼</sup> <sup>10</sup>−<sup>42</sup> for dependent Student's t-test and a <sup>p</sup>-value of <sup>∼</sup> <sup>10</sup>−<sup>42</sup> for Wilcoxon signed-rank test.

Based on these positive results, we have changed the iDFlakies tool [22] so that, by default, it runs the reverse of the previous order, instead of running a random order, if the previous order found no new flaky test.

### **5 Generating Test Orders to Cover Test Pairs**

We next discuss our algorithm to generate test orders that systematically cover all test pairs for a given set T with n tests. The motivation is that even with our change to increase the probability to detect OD tests, the randomization-based sampling remains inherently probabilistic and can fail to detect an OD test.

#### **5.1 Special Case: All Orders are Class-Compatible**

We first focus on the special case where we have only one class, or many classes that each have only one test, so all n! permutations are class-compatible. For example, for <sup>n</sup> = 2 we can cover both pairs with <sup>Ω</sup><sup>2</sup> <sup>=</sup> {t1, t2,t2, t1}, and for <sup>n</sup> <sup>=</sup> 4 we can cover all 12 pairs with 4 test orders <sup>Ω</sup><sup>4</sup> <sup>=</sup> {t1, t4, t2, t3,t2, t1, t3, t4, t3, t2, t4, t1,t4, t3, t1, t2}. Recall that <sup>n</sup> is the minimum number of test orders needed to cover all test pairs, so the cases for n = 2 and n = 4 are optimal. The reader is invited to consider for n = 3 whether we can cover all 6 test pairs with just 3 test orders. The answer is upcoming in this section.

To address this problem, we consider Tuscan squares [7], objects studied in the field of combinatorics. Given a natural number n, a Tuscan square consists of <sup>n</sup> rows, each of which is a permutation of the numbers {1, <sup>2</sup>,...,n}, and every pair i, j of distinct numbers occurs consecutively in some row. Tuscan squares are sometimes called "row-complete Latin squares" [34], but note that Tuscan squares need not have each column be a permutation of all numbers.

A Tuscan square of size n is equivalent to a decomposition of the complete graph on n vertices, Kn, into n Hamiltonian paths [42]. The decomposition for even n has been known since the 19th century and is often attributed to Walecki [26]. The decomposition for odd <sup>n</sup> <sup>≥</sup> 7 was published in 1980 by Tillson [42]. Tillson presented a beautiful construction for n = 4m + 3 and a rather involved construction for n = 4m + 1 with a recursive step and manually constructed base case for n = 9. In brief, Tuscan squares can be constructed for all values of n except n = 3 or n = 5. We did not find a public implementation for generating Tuscan squares, and considering the complexity of the case n = 4m+1 in Tillson's construction, we have made our implementation public [44].

We can directly translate permutations from Tuscan squares into n test orders that cover all test pairs in this special case (where all test pairs are either only intra-class test pairs of one class or only inter-class test pairs of n classes). These sets of test orders have the minimal possible cost: Cost(Ωn) = n(Cost<sup>0</sup> + Cost(T)), substantially lower than Cost(Ωp) for running all test pairs in isolation. For n = 3 and n = 5, we have to use 4 and 6 test orders, respectively, to cover all test pairs. For example, for n = 3 we can cover all 6 pairs with 4 orders {t1, t2, t3,t2, t1, t3,t3, t1,t3, t2}.

#### **5.2 General Case**

Algorithm 1 shows the pseudo-code algorithm to generate test orders that cover all test pairs in the general case where we have more than one class and at least one class has more than one test. The main function calls two functions to generate test orders that cover intra-class and inter-class test pairs.

The function cover intra class pairs generates test orders that cover all intra-class test pairs. For each class, the function compute tuscan square is used to generate test orders of tests within the class to cover all intra-class test pairs. These test orders for each class are then appended to form a test order for the entire test suite T. The function pick, invoked on multiple lines, **Algorithm 1:** Generate test orders that cover all intra-test-class and inter-test-class test-method pairs

```
1 Input: T # test suite, a set of test methods partitioned into test classes
2 Output: Ω # output is a set of test orders
3 Function cover all pairs():
4 Ω = {} # empty set
5 cover intra class pairs()
6 cover inter class pairs()
7 Function cover intra class pairs():
8 map = {} # map each class to all its intra-class orders
9 for C ∈ classes(T) do
10 map = map ∪ {	C, ωC 
                         | ωC ∈ compute tuscan square(C)}
11 while map = {} do
12 ω = 	
            # empty order
13 Cs = {C | ∃ωC .	C, ωC 
                         ∈ map}
14 for C ∈ Cs do
15 ωC = pick({ωC | 	C, ωC 
                            ∈ map})
16 map = map \ {	C, ωC 
                          }
17 ω = ω ⊕ ωC # append order
18 Ω = Ω ∪ {ω}
19 Function cover inter class pairs():
20 pairs = {	t, t

                |t, t ∈ T ∧ class(t) = class(t

                                   )}\ # from all inter-class pairs..
21 {	t, t

           |∃ω ∈ Ω. cover(ω,	t, t

                           )} # ..remove covered by intra-class orders
22 while pairs = {} do
23 ω = pick(pairs) # start with a randomly chosen not-covered pair
24 pairs = pairs \ {ω}
25 while true do
26 tp = ω|ω|−1 # previously last test
27 ts = {t| 	tp, t
                     ∈ pairs ∧ class(t) ∈/ classes(ω)}
28 if ts = {} then
29 break
30 tn = pick(ts) # next test to extend order
31 pairs = pairs\{	tp, tn
                         }
32 ω = ω ⊕ tn
33 Ω = Ω ∪ {ω}
```
chooses a random element from a set. The outer loop iterates as many times as the maximum number of intra-class test orders for any class. When the loop finishes, Ω contains a set of test orders that cover all intra-class and some interclass test pairs. Each test order that concatenates tests from l classes covers <sup>l</sup> <sup>−</sup> 1 inter-class test pairs. (Using just these test orders, we already detected 44 new OD tests in the test suites from the iDFlakies dataset.) Each intra-class test pair is covered by exactly one test order. Modulo the special cases for n = 3 and n = 5, each covered inter-class pair appears in exactly one test order in Ω, because Tuscan squares satisfy the invariant that each element appears only once as the first and once as the last in the permutations in a Tuscan square.

The function cover inter class pairs generates more test orders to cover the remaining inter-class test pairs. It uses a greedy algorithm to first initialize a test order with a randomly selected not-covered test pair and then extend the test order with a randomly selected not-covered test pair as long as an appropriate test pair exists. Extending the test order as long as possible reduces both the number of test orders and the number of times each test needs to be run.

We evaluate our randomized algorithm on 121 modules from the iDFlakies dataset as described in Section 3.1. We use the total cost, which considers the number of test orders and the number of tests in all of those test orders. The number of test orders is related to Cost0, while the number of tests is related to Cost<sup>1</sup> as defined in Section 3. We run our algorithm 10 times for various random seeds. The coefficient of variation [3] for each module shows that the algorithm is fairly stable, with the average for all modules being only 1.1% and 0.25% for the number of test orders and the number of tests, respectively.

Compared with Ω<sup>p</sup> that has all test orders of just test pairs, our randomized algorithm's average number of test orders and the average number of tests are only 3.68% and 51.8%, respectively, that of all the Ω<sup>p</sup> test orders. The overall cost of the test orders generated by our randomized algorithm is close to the optimal, because the number of test orders is reduced by almost two orders of magnitude, and 51.8% of the number of tests is close to the theoretical minimum of 50% that of Ω<sup>p</sup> test orders for Cost1.

### **6 Conclusion**

Order-dependent (OD) tests are one prominent category of flaky tests. Prior work [22,24,47] has used randomized test orders to detect OD tests. In this paper, we have presented the first analysis of the probability that randomized test orders detect OD tests. We have also proposed a simple change for sampling random test orders to increase the probability of detecting OD tests. We have finally proposed a novel algorithm that systematically explores all consecutive pairs of tests, guaranteeing to find all OD tests that depend on one other test. Our experimental results show that our algorithm runs substantially fewer tests than a naive exploration that runs all pairs of tests. Our runs of some test orders generated by the algorithm detect 44 new OD tests, not detected in prior work [22,24,25] on the same evaluation dataset.

### **Acknowledgments**

We are grateful to Peter Taylor for a StackExchange post [39] that led us to the concept of Tuscan squares. We thank Dragan Stevanovi´c, Wenyu Wang, and Zhengkai Wu for discussions about Tuscan squares and Reed Oei for comments on the paper draft. This work was partially supported by NSF grants CNS-1564274, CNS-1646305, CCF-1763788, and CCF-1816615. We also acknowledge support for research on flaky tests from Facebook and Google.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Timed Systems**

### **Timed Automata Relaxation for Reachability**

Jaroslav Bend´ık <sup>1</sup> , Ahmet Sencan<sup>2</sup> , Ebru Aydin Gol<sup>2</sup> , and Ivana Cern´a ˇ <sup>1</sup>

<sup>1</sup> Faculty of Informatics, Masaryk University, Brno, Czech Republic

{xbendik,cerna}@fi.muni.cz <sup>2</sup> Department of Computer Engineering, Middle East Technical University, Ankara, Turkey {sencan.ahmet,ebrugol}@metu.edu.tr

**Abstract.** Timed automata (TA) have shown to be a suitable formalism for modeling real-time systems. Moreover, modern model-checking tools allow a designer to check whether a TA complies with the system specification. However, the exact timing constraints of the system are often uncertain during the design phase. Consequently, the designer is able to build a TA with a correct structure, however, the timing constraints need to be tuned to make the TA comply with the specification. In this work, we assume that we are given a TA together with an existential property, such as reachability, that is not satisfied by the TA. We propose a novel concept of a minimal sufficient reduction (MSR) that allows us to identify the minimal set S of timing constraints of the TA that needs to be tuned to meet the specification. Moreover, we employ mixed-integer linear programming to actually find a tuning of S that leads to meeting the specification.

**Keywords:** Timed Automata · Relaxation · Design · Reachability.

### **1 Introduction**

A timed automaton (TA) [4] is a finite automaton extended with a set of real-time variables, called clocks, which capture the time. The clocks enrich the semantics and the constraints on the clocks restrict the behavior of the automaton, which are particularly important in modeling time-critical systems. The examples of TA models of critical systems include scheduling of real-time systems [30,29,33], medical devices [43,38], and rail-road crossing systems [52].

Model-checking methods allow for verifying whether a given TA meets a given system specification. Contemporary model-checking tools, such as UPPAAL [17] or Imitator [9], have proved to be practically applicable on various industrial case studies [17,10,34]. Unfortunately, during the system design phase, the system information is often incomplete. A designer is often able to build a TA with correct structure, i.e., exactly capturing locations and transitions of the modeled system, however the exact clock (timing) constraints that enable/trigger the transitions are uncertain. Thus, the produced TA often does not meet the specification (i.e., it does not pass the model-checking) and it needs to be fixed. If the specification declares universal properties, e.g., safety or unavoidability, that need to hold on

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 291–310, 2021. https://doi.org/10.1007/978-3-030-72016-2 16

each trace of the TA, a model-checker either returns "yes", or it returns "no" and generates a trace along which the property is violated. This trace can be used to repair the model in an automated way [42]. However, in the case of existential properties, such as reachability, the property has to hold on a trace of the TA. The model-checker either returns "yes" and generates a witness trace satisfying the property, or returns just "no" and does not provide any additional information that would help the designer to correct the TA.

**Contribution.** In this paper, we study the following problem: given a timed automaton A and a reachability property that is not satisfied by A, relax clock constraints of A such that the resultant automaton A<sup>1</sup> satisfies the reachability property. Moreover, the goal is to minimize the number of the relaxed clock constraints and, secondary, also to minimize the overall change of the timing constants used in the clock constraints. We propose a two step solution for this problem. In the first step, we identify a minimal sufficient reduction (MSR) of A, i.e., an automaton A<sup>2</sup> that satisfies the reachability property and originates from A by removing only a minimal necessary set of clock constraints. In the second step, instead of completely removing the clock constraints, we employ mixed integer linear programming (MILP) to find a minimal relaxation of the constraints that leads to a satisfaction of the reachability property along a witness path.

The underlying assumption is that during the design the most suitable timing constants reflecting the system properties are defined. Thus, our goal is to generate a TA satisfying the reachability property by changing a minimum number of timing constants. Some of the constraints of the initial TA can be strict (no relaxation is possible), which can easily be integrated to the proposed solution. Thus, the proposed method can be viewed as a way to handle design uncertainties: develop a TA A in a best-effort basis and apply our algorithm to find a A<sup>1</sup> that is as close as possible to A and satisfies the given reachability property.

**Related Work.** Another way to handle uncertainties about timing constants is to build a parametric timed automaton (PTA), i.e., a TA where clock constants can be represented with parameters. Subsequently, a parameter synthesis tool, such as [46,9,26], can be used to find suitable values of the parameters for which the resultant TA satisfies the specification. However, most of the parameter synthesis problems are undecidable [6]. While symbolic algorithms without termination guarantees exist for some subclasses [25,39,12], these algorithms are computationally very expensive compared to model checking (see [5]). Moreover, minimizing the number of modified clock constraints is not straightforward.

A related TA repair problem has been studied in a recent work [7], where the authors also assumed that some of the constraints are incorrect. To repair the TA, they parametrized the initial TA and generated parameters by analyzing traces of the TA. However, the authors [7] do not focus on repairing the TA w.r.t. reachability properties as we do. Instead, their goal is to make the TA compliant with an oracle that decides if a trace of the TA belongs to a system or not. Thus, their approach cannot handle reachability properties. Furthermore in [7], the total change of the timing constraints is minimized, while we primarily minimize the number of changed constraints, then the total change.

$$\begin{aligned} &x \lessapprox 10 \underbrace{\begin{aligned} &\text{start} \\ &y \end{aligned} = \underbrace{e\_2 \, z \gtrless \frac{3}{5}}\_{\begin{subarray}{c} t\_2 \ \begin{subarray}{c} e\_2 \ \begin{subarray}{c} z \geqslant 3 \\ y \end{subarray} \\ 1 \geqslant x \end{subarray} \end{aligned}} \underbrace{\begin{aligned} &e\_2 \, z \geqslant 3 \\ &y \geqslant 4 \\ &z \mathrel{z} \end{aligned}}\_{\begin{subarray}{c} e\_1 \ \begin{subarray}{c} x \geqslant 9 \\ y \end{subarray} \\ &z \lessapprox 0 \end{aligned}} \underbrace{\begin{aligned} &e\_3 \, z \geqslant 4 \\ &e\_4 \, z \geqslant 0 \\ &z \leqslant 10 \land x \leqslant 14 \\ &z \leqslant 10 \land x \leqslant 14 \\ &z \leqslant 10 \end{aligned}}\_{\begin{subarray}{c} t\_2 \ \begin{subarray}{c} e\_2 \ \begin{subarray}{c} x \geqslant 9 \ \begin{subarray}{c} z \geqslant 9 \ \begin{array}{c} x \geqslant 9 \ \begin{array}{c} x \geqslant 9 \ \begin{array}{c} x \geqslant 9 \end{array} \\ } x \simeq \end{array} \end{aligned} \end{aligned}} \end{aligned}}\_{\begin{array}{c} e\_3 \ \begin{subarray}{c} t\_3 \ \begin{subarray}{c} z \leqslant 9 \ \begin{array}{c} x \geqslant 9 \ \begin{array}{c} z \geqslant 9 \ \end{array} \\ \begin{array}{c} z \leqslant 9 \ \begin{array}{c} x \geqslant 9 \ \end{array} \\ \end{array} \end{aligned} \end{aligned}}$$

**Fig. 1.** An example of a timed automaton.

### **2 Preliminaries and Problem Formulation**

#### **2.1 Timed Automata**

A timed automaton (TA) [3,4,44] is a finite-state machine extended with a finite set <sup>C</sup> of real-valued clocks. A clock <sup>x</sup> <sup>P</sup> <sup>C</sup> measures the time spent after its last reset. In a TA, clock constraints are defined for locations (states) and transitions. <sup>A</sup> simple clock constraint is defined as <sup>x</sup>´<sup>y</sup> " <sup>c</sup> where x, y <sup>P</sup> <sup>C</sup> Yt0u, " P tă, ďu and <sup>c</sup> <sup>P</sup> <sup>Z</sup>Yt8u. <sup>3</sup> Simple clock constraints and constraints obtained by combining these with conjunction operator (^) are called clock constraints. The sets of simple and all clock constraints are denoted by <sup>Φ</sup>SpC<sup>q</sup> and <sup>Φ</sup>pCq, respectively. For a clock constraint <sup>φ</sup> <sup>P</sup> <sup>Φ</sup>pCq, <sup>S</sup>pφ<sup>q</sup> denotes the simple constraints from <sup>φ</sup>, e.g., <sup>S</sup>p<sup>x</sup> ´ <sup>y</sup> <sup>ă</sup> <sup>10</sup> ^ <sup>y</sup> <sup>ď</sup> <sup>20</sup>q"t<sup>x</sup> ´ <sup>y</sup> <sup>ă</sup> <sup>10</sup>, y <sup>ď</sup> <sup>20</sup>u. A clock valuation <sup>v</sup> : <sup>C</sup> <sup>Ñ</sup> <sup>R</sup>` assigns non-negative real values to each clock. The notation <sup>v</sup> |ù <sup>φ</sup> denotes that the clock constraint <sup>φ</sup> evaluates to true when each clock <sup>x</sup> is replaced with <sup>v</sup>pxq. For a clock valuation <sup>v</sup> and <sup>d</sup> <sup>P</sup> <sup>R</sup>`, <sup>v</sup> ` <sup>d</sup> is the clock valuation obtained by delaying each clock by <sup>d</sup>, i.e., <sup>p</sup><sup>v</sup> ` <sup>d</sup>qpxq " <sup>v</sup>pxq ` <sup>d</sup> for each <sup>x</sup> <sup>P</sup> <sup>C</sup>. For <sup>λ</sup> <sup>Ď</sup> <sup>C</sup>, <sup>v</sup>r<sup>λ</sup> :" <sup>0</sup><sup>s</sup> is the clock valuation obtained after resetting each clock from <sup>λ</sup>, i.e., <sup>v</sup>r<sup>λ</sup> :" <sup>0</sup>spxq " 0 for each <sup>x</sup> <sup>P</sup> <sup>λ</sup> and <sup>v</sup>r<sup>λ</sup> :" <sup>0</sup>spxq " <sup>v</sup>px<sup>q</sup> for each <sup>x</sup> <sup>P</sup> <sup>C</sup>zλ.

**Definition 1 (Timed Automata).** A timed automaton <sup>A</sup> " pL, l0, C, Δ,Inv<sup>q</sup> is a tuple, where <sup>L</sup> is a finite set of locations, <sup>l</sup><sup>0</sup> <sup>P</sup> <sup>L</sup> is the initial location, <sup>C</sup> is a finite set of clocks, <sup>Δ</sup> <sup>Ď</sup> <sup>L</sup> <sup>ˆ</sup> <sup>2</sup><sup>C</sup> <sup>ˆ</sup> <sup>Φ</sup>pCq ˆ <sup>L</sup> is a finite transition relation, and Inv : <sup>L</sup> <sup>Ñ</sup> <sup>Φ</sup>pC<sup>q</sup> is an invariant function.

For a transition <sup>e</sup> " pls, λ, φ, ltq P <sup>Δ</sup>, <sup>l</sup><sup>s</sup> is the source location, <sup>l</sup><sup>t</sup> is the target location, λ is the set of clocks reset on e and φ is the guard (i.e., a clock constraint) tested for enabling e. The semantics of a TA is given by a labelled transition system (LTS). An LTS is a tuple <sup>T</sup> " pS, s0,Σ, Ñq, where <sup>S</sup> is a set of states, <sup>s</sup><sup>0</sup> <sup>P</sup> <sup>S</sup> is an initial state, <sup>Σ</sup> is a set of symbols, and Ñ Ď <sup>S</sup> <sup>ˆ</sup> <sup>Σ</sup> <sup>ˆ</sup> <sup>S</sup> is a transition relation. A transition <sup>p</sup>s, a, s<sup>1</sup> qPÑ is also shown as <sup>s</sup> <sup>a</sup> <sup>Ñ</sup> <sup>s</sup><sup>1</sup> .

**Definition 2 (LTS semantics for TA).** Given a TA <sup>A</sup> " pL, l0, C, Δ,Invq, the labelled transition system <sup>T</sup>pAq"pS, s0,Σ, Ñq is defined as follows:

<sup>3</sup> Simple constraints are only defined as upper bounds to ease the presentation. This definition is not restrictive since x ´ y ě c and x ě c are equivalent to y ´ x ď ´c and 0 ´ x ď ´c, respectively. A similar argument holds for strict inequality pąq.

	- ' delay transition: <sup>p</sup>l, v<sup>q</sup> <sup>d</sup> Ñ pl, v ` <sup>d</sup><sup>q</sup> if <sup>v</sup> ` <sup>d</sup> |ù Invpl<sup>q</sup>
	- ' discrete transition: <sup>p</sup>l, v<sup>q</sup> act Ñ p<sup>l</sup> 1 , v1 <sup>q</sup> if there exists <sup>p</sup>l, λ, φ, l<sup>1</sup> q P <sup>Δ</sup> such that <sup>v</sup> |ù <sup>φ</sup>, <sup>v</sup><sup>1</sup> " <sup>v</sup>r<sup>λ</sup> :" <sup>0</sup>s, and <sup>v</sup><sup>1</sup> |ù Invp<sup>l</sup> 1 q.

The notation <sup>s</sup>Ñds<sup>1</sup> is used to denote a delay transition of duration <sup>d</sup> followed by a discrete transition from s to s<sup>1</sup> , i.e., s d <sup>Ñ</sup> <sup>s</sup> act <sup>Ñ</sup> <sup>s</sup><sup>1</sup> . A run <sup>ρ</sup> of <sup>A</sup> is either a finite or an infinite alternating sequence of delay and discrete transitions, i.e., <sup>ρ</sup> " <sup>s</sup>0Ñ<sup>d</sup><sup>0</sup> <sup>s</sup>1Ñ<sup>d</sup><sup>1</sup> <sup>s</sup>2Ñ<sup>d</sup><sup>2</sup> ¨¨¨ . The set of all runs of <sup>A</sup> is denoted by rrAss.

A path <sup>π</sup> of <sup>A</sup> is an interleaving sequence of locations and transitions, <sup>π</sup> " <sup>l</sup>0, e1, l1, e2,..., where <sup>e</sup><sup>i</sup>`<sup>1</sup> " pli, λ<sup>i</sup>`1, φ<sup>i</sup>`1, l<sup>i</sup>`1q P <sup>Δ</sup> for each <sup>i</sup> <sup>ě</sup> 0. A path <sup>π</sup> " l0, e1, l1, e2,... is said to be realizable if there exists a delay sequence d0, d1,... such that <sup>p</sup>l0, **<sup>0</sup>**qÑ<sup>d</sup><sup>0</sup> <sup>p</sup>l1, v1qÑ<sup>d</sup><sup>1</sup> <sup>p</sup>l1, v2qÑ<sup>d</sup><sup>2</sup> ¨¨¨ is a run of <sup>A</sup> and for every <sup>i</sup> <sup>ě</sup> 1, the <sup>i</sup>th discrete transition is taken according to <sup>e</sup>i, i.e., <sup>e</sup><sup>i</sup> " pl<sup>i</sup>´1, λi, φi, liq, <sup>v</sup><sup>i</sup>´<sup>1</sup> ` <sup>d</sup><sup>i</sup>´<sup>1</sup> |ù <sup>φ</sup>i, <sup>v</sup><sup>i</sup> " pv<sup>i</sup>´<sup>1</sup> ` <sup>d</sup><sup>i</sup>´1qrλ<sup>i</sup> :" <sup>0</sup><sup>s</sup> and <sup>v</sup><sup>i</sup> |ù Inv<sup>1</sup> <sup>p</sup>liq.

Given a TA <sup>A</sup>, a subset <sup>L</sup><sup>T</sup> <sup>Ă</sup> <sup>L</sup> of its locations is reachable on <sup>A</sup> if there exists <sup>ρ</sup> " pl0, **<sup>0</sup>**qÑ<sup>d</sup><sup>0</sup> <sup>p</sup>l1, v1qÑ<sup>d</sup><sup>1</sup> ... <sup>Ñ</sup><sup>d</sup>n´<sup>1</sup> <sup>p</sup>ln, vnq P rrAss such that <sup>l</sup><sup>n</sup> <sup>P</sup> <sup>L</sup><sup>T</sup> ; otherwise, L<sup>T</sup> is unreachable. The reachability problem is decidable and implemented in various verification tools, e.g., [17,9]. The verifier either returns "No" when the location is unreachable, or it generates a run (witness) reaching the set L<sup>T</sup> .

Example 1. Figure <sup>1</sup> illustrates a TA with 8 locations: <sup>t</sup>l0,...,l7u, 9 transitions: <sup>t</sup>e1,...,e9u, an initial location <sup>l</sup>0, and an unreachable set of locations <sup>L</sup><sup>T</sup> " tl4u.

#### **2.2 Timed Automata Relaxations and Reductions**

For a timed automaton <sup>A</sup> " pL, l0, C, Δ, Invq, the set of pairs of transition and associated simple constraints is defined in (1) and the set of pairs of location and associated simple constraints is defined in (2).

$$\Psi(\Delta) = \{ (e, \varphi) \mid e = (l\_s, \lambda, \phi, l\_t) \in \Delta, \varphi \in \mathcal{S}(\phi) \}\tag{1}$$

$$\Psi(Inv) = \{ (l, \varphi) \mid l \in L, \varphi \in \mathcal{S}(Inv(l)) \}\tag{2}$$

**Definition 3 (constraint-relaxation).** Let <sup>φ</sup> <sup>P</sup> <sup>Φ</sup>pC<sup>q</sup> be a constraint over <sup>C</sup>, <sup>Θ</sup> <sup>Ď</sup> <sup>S</sup>pφ<sup>q</sup> be a subset of its simple constraints and **<sup>r</sup>** : <sup>Θ</sup> <sup>Ñ</sup> <sup>N</sup>Y t8u be a positive valued relaxation valuation. The relaxed constraint is defined as:

$$R(\phi, \Theta, \mathbf{r}) = \left(\bigwedge\_{\varphi \in \mathcal{S}(\phi) \backslash \Theta} \varphi\right) \wedge \left(\bigwedge\_{\varphi = x - y \sim c \in \Theta} x - y \sim c + \mathbf{r}(\varphi)\right) \tag{3}$$

Intuitively, <sup>R</sup>pφ, Θ, **<sup>r</sup>**<sup>q</sup> relaxes only the thresholds of simple constraints from <sup>Θ</sup> with respect to **<sup>r</sup>**, e.g., <sup>R</sup>p<sup>x</sup> ´ <sup>y</sup> <sup>ď</sup> <sup>10</sup> ^ <sup>y</sup> <sup>ă</sup> <sup>20</sup>, <sup>t</sup><sup>y</sup> <sup>ă</sup> <sup>20</sup>u, **<sup>r</sup>**q " <sup>x</sup> ´ <sup>y</sup> <sup>ď</sup> <sup>10</sup> ^ <sup>y</sup> <sup>ă</sup> 23, where **<sup>r</sup>**p<sup>y</sup> <sup>ă</sup> <sup>20</sup>q " 3. Setting a threshold to <sup>8</sup> implies removing the corresponding simple constraint, e.g., <sup>R</sup>p<sup>x</sup> ´ <sup>y</sup> <sup>ď</sup> <sup>10</sup> ^ <sup>y</sup> <sup>ă</sup> <sup>20</sup>, <sup>t</sup><sup>y</sup> <sup>ă</sup> <sup>20</sup>u, **<sup>r</sup>**q " <sup>x</sup> ´ <sup>y</sup> <sup>ď</sup> 10, where **<sup>r</sup>**p<sup>y</sup> <sup>ă</sup> <sup>20</sup>q"8. Note that <sup>R</sup>pφ, Θ, **<sup>r</sup>**q " <sup>φ</sup> when <sup>Θ</sup> is empty.

**Definition 4 ((**D, I, **r)-relaxation).** Let <sup>A</sup> " pL, l0, C, Δ, Inv<sup>q</sup> be a TA, <sup>D</sup> <sup>Ď</sup> <sup>Ψ</sup>pΔ<sup>q</sup> and <sup>I</sup> <sup>Ď</sup> <sup>Ψ</sup>pInv<sup>q</sup> be transition and location constraint sets, and **<sup>r</sup>** : <sup>D</sup>Y<sup>I</sup> <sup>Ñ</sup> <sup>N</sup> Y t8u be a positive valued relaxation valuation. The (D, I, **<sup>r</sup>**)-relaxation of <sup>A</sup>, denoted <sup>A</sup>ăD,I,**r**ą, is a TA <sup>A</sup><sup>1</sup> " pL<sup>1</sup> , l1 0, C<sup>1</sup> , Δ1 , Inv1 q such that:


Intuitively, the TA AăD,I,**r**<sup>ą</sup> emerges from A by relaxing the guards of the transitions from the set D and relaxing invariants of the locations from I with respect to **r**. In the special case of setting the threshold of each constraint from <sup>D</sup> and <sup>I</sup> to <sup>8</sup>, i.e., when **<sup>r</sup>**paq"8 for each <sup>a</sup> <sup>P</sup> <sup>D</sup> <sup>Y</sup> <sup>I</sup>, the corresponding simple constraints are effectively removed, which is called a (D,I)-reduction and denoted by AăD,Ią. Note that A " AăH,Hą.

**Proposition 1.** Let <sup>A</sup> " pL, l0, C, Δ, Inv<sup>q</sup> be a timed automaton, <sup>D</sup> <sup>Ď</sup> <sup>Ψ</sup>pΔ<sup>q</sup> and <sup>I</sup> <sup>Ď</sup> <sup>Ψ</sup>pInv<sup>q</sup> be sets of simple guard and invariant constraints, and **<sup>r</sup>** : <sup>D</sup> <sup>Y</sup> <sup>I</sup> <sup>Ñ</sup> <sup>N</sup> Y t8u be a relaxation valuation. Then rrAss Ď rrAăD,I,**r**ąss.

Proof. Observe that for a clock constraint <sup>φ</sup> <sup>P</sup> <sup>Φ</sup>pCq, a subset of its simple constraints <sup>Θ</sup> <sup>Ď</sup> <sup>S</sup>pφq, a relaxation valuation **<sup>r</sup>**<sup>1</sup> for <sup>Θ</sup>, and the relaxed constraint <sup>R</sup>pφ, Θ, **<sup>r</sup>**<sup>1</sup> <sup>q</sup> as in Definition 3, it holds that for any clock valuation <sup>v</sup> : <sup>v</sup> |ù <sup>φ</sup> ùñ <sup>v</sup> |ù <sup>R</sup>pφ, Θ, **<sup>r</sup>**<sup>1</sup> <sup>q</sup>. Now, consider a run <sup>ρ</sup> " pl0, **<sup>0</sup>**qÑ<sup>d</sup><sup>0</sup> <sup>p</sup>l1, v1qÑ<sup>d</sup><sup>1</sup> <sup>p</sup>l2, v2qÑ<sup>d</sup><sup>2</sup> ¨¨¨ P rrAss. Let <sup>π</sup> " <sup>l</sup>0, e1, l1, e2,... with <sup>e</sup><sup>i</sup> " pl<sup>i</sup>´1, λi, φi, liq P <sup>Δ</sup> for each <sup>i</sup> <sup>ě</sup> 1 be the path realized as ρ via delay sequence d0, d1,.... By Definition 4 for each <sup>p</sup>l, λ, φ, l<sup>1</sup> q P <sup>Δ</sup>, there is <sup>p</sup>l, λ, Rpφ, D|e, **<sup>r</sup>**|eq, l<sup>1</sup> q P <sup>Δ</sup><sup>1</sup> . We define a path induced by <sup>π</sup> on <sup>A</sup>ăD,I,**r**<sup>ą</sup> as:

$$M(\pi) = l\_0, \left(l\_0, \lambda\_1, R(\phi\_1, D|\_{e\_1}, \mathbf{r}|\_{e\_1}), l\_1\right), l\_1, \left(l\_1, \lambda\_2, R(\phi\_2, D|\_{e\_2}, \mathbf{r}|\_{e\_2}), l\_2\right), \dots \quad \text{(4)}$$

For each <sup>i</sup> " <sup>0</sup>,...,n ´ 1 it holds that <sup>v</sup><sup>i</sup> |ù <sup>R</sup>pInvpliq, D|<sup>l</sup><sup>i</sup> , **<sup>r</sup>**|<sup>l</sup><sup>i</sup> <sup>q</sup>, <sup>v</sup><sup>i</sup> ` <sup>d</sup><sup>i</sup> |ù <sup>R</sup>pInvpliq, D|<sup>l</sup><sup>i</sup> , **<sup>r</sup>**|<sup>l</sup><sup>i</sup> <sup>q</sup> and <sup>v</sup>i`d<sup>i</sup> |ù <sup>R</sup>pφ<sup>i</sup>`<sup>1</sup>, D|<sup>e</sup>i`<sup>1</sup> , **<sup>r</sup>**|<sup>e</sup>i`<sup>1</sup> <sup>q</sup>. Thus <sup>M</sup>pπ<sup>q</sup> is realizable on <sup>A</sup>ăD,I,**r**<sup>ą</sup> via the same delay sequence and <sup>ρ</sup> P rrAăD,I,**r**ąss. As <sup>ρ</sup> P rrAss is arbitrary, we conclude that rrAss Ď rrAăD,I,**r**ąss.

#### **2.3 Problem Statement**

Problem 1. Given a TA <sup>A</sup> " pL, l0, C, Δ, Inv<sup>q</sup> and a set of target locations <sup>L</sup><sup>T</sup> <sup>Ă</sup> <sup>L</sup> that is unreachable on <sup>A</sup>, find a (D, I, **<sup>r</sup>**)-relaxation <sup>A</sup>ăD,I,**r**<sup>ą</sup> of <sup>A</sup> such that <sup>L</sup><sup>T</sup> is reachable on <sup>A</sup>ăD,I,**r**ą. Moreover, the goal is to identify a (D, I, **<sup>r</sup>**)-relaxation that minimizes the number <sup>|</sup><sup>D</sup> <sup>Y</sup> <sup>I</sup><sup>|</sup> of relaxed constraints, and, secondary, we tend to minimize the overall change of the clock constraints ř <sup>c</sup>PDY<sup>I</sup> **<sup>r</sup>**pcq.

We propose a two step solution to this problem. In the first step, we identify a subset <sup>D</sup> <sup>Y</sup> <sup>I</sup> of the simple constraints <sup>Ψ</sup>pΔq Y <sup>Ψ</sup>pInv<sup>q</sup> such that <sup>L</sup><sup>T</sup> is reachable on the (D, I)-reduction <sup>A</sup>ăD,I<sup>ą</sup> and <sup>|</sup><sup>D</sup> <sup>Y</sup>I<sup>|</sup> is minimized. Consequently, we can obtain a witness path of the reachability on AăD,I<sup>ą</sup> from the verifier. The path would be realizable on <sup>A</sup> if we remove the constraints <sup>D</sup> <sup>Y</sup> <sup>I</sup>. In the second step, instead of completely removing the constraints <sup>D</sup> <sup>Y</sup> <sup>I</sup>, we find a relaxation valuation **<sup>r</sup>** : <sup>D</sup>Y<sup>I</sup> <sup>Ñ</sup> <sup>N</sup>Yt8u such that the path found in the first step is realizable on AăD,I,**r**ą. To find **r**, we introduce relaxation parameters for constraints in <sup>D</sup> <sup>Y</sup> <sup>I</sup>. Subsequently, we solve an MILP problem to find a valuation of the parameters, i.e., **r**, that makes the path realizable on AăD,I,**r**<sup>ą</sup> and minimizes ř <sup>c</sup>PDY<sup>I</sup> **<sup>r</sup>**pcq. Note that it might be the case that the reduction <sup>A</sup>ăD,I<sup>ą</sup> contains multiple realizable paths that lead to L<sup>T</sup> , and another path might result in a smaller overall change. Also, there might exist another candidate subset <sup>D</sup><sup>1</sup> <sup>Y</sup> <sup>I</sup><sup>1</sup> with <sup>|</sup>D<sup>1</sup> <sup>Y</sup> <sup>I</sup><sup>1</sup> |"|<sup>D</sup> <sup>Y</sup> <sup>I</sup><sup>|</sup> that would lead to a smaller overall change. While our approach can be applied to a number of paths and a number of candidate subsets <sup>D</sup> <sup>Y</sup> <sup>I</sup>, processing all of them can be practically intractable.

### **3 Minimal Sufficient (***D***,***I***)-Reductions**

Throughout this section, we simply write a reduction when talking about a (D,I) reduction of A. To name a reduction, we either simply use capital letters, e.g., M,N,K, or we use the notation <sup>A</sup>ăD,I<sup>ą</sup> to also specify the sets D, I of simple clock constraints. Given a reduction <sup>N</sup> " <sup>A</sup>ăD,Ią, <sup>|</sup>N<sup>|</sup> denotes the cardinality <sup>|</sup><sup>D</sup> <sup>Y</sup> <sup>I</sup>|. Furthermore, <sup>R</sup><sup>A</sup> denotes the set of all reductions. We define a partial order relation <sup>Ď</sup> on <sup>R</sup><sup>A</sup> as <sup>A</sup>ăD,I<sup>ą</sup> <sup>Ď</sup> <sup>A</sup>ăD1,I1<sup>ą</sup> iff <sup>D</sup> <sup>Y</sup> <sup>I</sup> <sup>Ď</sup> <sup>D</sup><sup>1</sup> <sup>Y</sup> <sup>I</sup><sup>1</sup> . Similarly, we write <sup>A</sup>ăD,I<sup>ą</sup> <sup>Ĺ</sup> <sup>A</sup>ăD1,I1<sup>ą</sup> iff <sup>D</sup> <sup>Y</sup><sup>I</sup> <sup>Ĺ</sup> <sup>D</sup><sup>1</sup> <sup>Y</sup>I<sup>1</sup> . We say that a reduction AăD,I<sup>ą</sup> is <sup>a</sup> sufficient reduction (w.r.t. <sup>A</sup> and <sup>L</sup><sup>T</sup> ) iff <sup>L</sup><sup>T</sup> is reachable on <sup>A</sup>ăD,Ią; otherwise, AăD,I<sup>ą</sup> is an insufficient reduction. Crucial observation for our work is that the property of being a sufficient reduction is monotone w.r.t. the partial order:

**Proposition 2.** Let AăD,I<sup>ą</sup> and AăD1,I1<sup>ą</sup> be reductions such that AăD,I<sup>ą</sup> Ď AăD1,I1ą. If AăD,I<sup>ą</sup> is sufficient then AăD1,I1<sup>ą</sup> is also sufficient.

Proof. Note that <sup>A</sup>ăD1,I1<sup>ą</sup> is a (D<sup>1</sup> <sup>z</sup>D,I<sup>1</sup> <sup>z</sup>I)-reduction of <sup>A</sup>ăD,Ią. By Proposition 1, rrAăD,Iąss Ď rrAăD1,I1ąss, i.e., the run of AăD,I<sup>ą</sup> that witnesses the reachability of <sup>L</sup><sup>T</sup> is also a run of <sup>A</sup>ăD1,I1ą.

**Definition 5 (MSR).** A sufficient reduction AăD,I<sup>ą</sup> is a minimal sufficient reduction (MSR) iff there is no <sup>c</sup> <sup>P</sup> <sup>D</sup> <sup>Y</sup> <sup>I</sup> such that the reduction <sup>A</sup>ăDztcu,Iztcuą is sufficient. Equivalently, due to Proposition 2, AăD,I<sup>ą</sup> is an MSR iff there is no sufficient reduction AăD1,I1<sup>ą</sup> such that AăD1,I1<sup>ą</sup> Ĺ AăD,Ią.

Recall that a reduction <sup>A</sup>ăD,I<sup>ą</sup> is determined by <sup>D</sup> <sup>Ď</sup> <sup>Ψ</sup>pΔ<sup>q</sup> and <sup>I</sup> <sup>Ď</sup> <sup>Ψ</sup>pInvq. Consequently, <sup>|</sup>RA| " <sup>2</sup>|ΨpΔqYΨpInvq|. Moreover, there can be up to ` <sup>k</sup> k{2 ˘ MSRs where <sup>k</sup> " |ΨpΔq Y <sup>Ψ</sup>pInvq| (see Sperner's theorem [51]). Also note, that the minimality of a reduction does not mean a minimum number of simple clock


constraints that are reduced by the reduction; there can exist two MSRs, M and <sup>N</sup>, such that <sup>|</sup>M|ă|N|. Since our overall goal is to relax <sup>A</sup> as little as possible, we identify a minimum MSR, i.e., an MSR M such that there is no MSR M<sup>1</sup> with <sup>|</sup>M<sup>1</sup> |ă|M|, and then use the minimum MSR for the MILP part (Section 4) of our overall approach. There can be also up to ` <sup>k</sup> k{2 ˘ minimum MSRs.

Example 2. Assume the TA <sup>A</sup> and <sup>L</sup><sup>T</sup> " tl4<sup>u</sup> from Example <sup>1</sup> (Fig. 1). There are 24 MSRs and 4 of them are minimum. For example, <sup>A</sup>ăD,I<sup>ą</sup> with <sup>D</sup> " tpe5, x <sup>ě</sup> <sup>25</sup>qu and <sup>I</sup> " tpl3, u <sup>ď</sup> <sup>26</sup>qu is a minimum MSR, and <sup>A</sup>ăD1,I1<sup>ą</sup> with <sup>D</sup><sup>1</sup> " tpe9, y <sup>ď</sup> <sup>15</sup>q,pe7, z <sup>ď</sup> <sup>15</sup>qu and <sup>I</sup><sup>1</sup> " tpl6, x <sup>ď</sup> <sup>10</sup>qu is a non-minimum MSR.

#### **3.1 Base Scheme For Computing a Minimum MSR**

Algorithm 1 shows a high-level scheme of our approach for computing a minimum MSR. The algorithm iteratively identifies an ordered set of MSRs, <sup>|</sup>M1|ą|M2| ą ¨¨¨ą|Mk|, such that the last MSR <sup>M</sup><sup>k</sup> is a minimum MSR. Each of the MSRs, say Mi, is identified in two steps. First, the algorithm finds a seed, i.e., a reduction <sup>N</sup><sup>i</sup> such that <sup>N</sup><sup>i</sup> is sufficient and <sup>|</sup>Ni|ă|M<sup>i</sup>´1|. Second, the algorithm shrinks <sup>N</sup><sup>i</sup> into an MSR <sup>M</sup><sup>i</sup> such that <sup>M</sup><sup>i</sup> <sup>Ď</sup> <sup>N</sup><sup>i</sup> (and thus <sup>|</sup>Mi|ď|Ni|). The initial seed <sup>N</sup><sup>1</sup> is AăΨpΔq,ΨpInvqą, i.e., the reduction that removes all simple clock constraints (which makes all locations of A trivially reachable). Once there is no sufficient reduction <sup>N</sup><sup>i</sup> with <sup>|</sup>Ni|ă|M<sup>i</sup>´<sup>1</sup>|, we know that <sup>M</sup><sup>i</sup>´<sup>1</sup> " <sup>M</sup><sup>k</sup> is a minimum MSR.

Note that the algorithm also maintains two auxiliary sets, M and I, to store all identified MSRs and insufficient reductions, respectively. The two sets are used during the process of finding and shrinking a seed which we describe below.

#### **3.2 Shrinking a Seed**

Our approach for shrinking a seed N into an MSR M is based on two concepts: a critical simple clock constraint and a reduction core.

**Definition 6 (critical constraint).** Given a sufficient reduction AăD,Ią, a simple clock constraint <sup>c</sup> is critical for <sup>A</sup>ăD,I<sup>ą</sup> iff <sup>A</sup>ăDztcu,Iztcuą is insufficient.

**Proposition 3.** If <sup>c</sup> <sup>P</sup> <sup>D</sup> <sup>Y</sup> <sup>I</sup> is critical for a sufficient reduction <sup>A</sup>ăD,I<sup>ą</sup> then <sup>c</sup> is critical for every sufficient reduction <sup>A</sup>ăD1,I1ą, <sup>A</sup>ăD1,I1<sup>ą</sup> <sup>Ď</sup> <sup>A</sup>ăD,Ią. Moreover, by Definitions <sup>5</sup> and 6, <sup>A</sup>ăD,I<sup>ą</sup> is an MSR iff every <sup>c</sup> <sup>P</sup> <sup>D</sup> <sup>Y</sup> <sup>I</sup> is critical for AăD,Ią.

# **Algorithm 2:** shrink(AăD,Ią, <sup>I</sup>)

**<sup>1</sup>** X Ð H **while** pD Y Iq ‰ X **do** c Ð pick a simple clock constraint from pD Y IqzX **if** <sup>A</sup>ăDztcu,Iztcuą <sup>R</sup> <sup>I</sup> *and* <sup>A</sup>ăDztcu,Iztcuą *is sufficient* **then** ρ Ð a witness run of the sufficiency of AăDztcu,Iztcuą AăD,I<sup>ą</sup> Ð the reduction core of AăDztcu,Iztcuą w.r.t. ρ **7 else** X Ð X Y tcu I Ð I Y tN P R<sup>A</sup> | N Ď AăDztcu,Iztcuąu **return** AăD,Ią, I

Proof. By contradiction, assume that <sup>c</sup> is critical for <sup>A</sup>ăD,I<sup>ą</sup> but not for <sup>A</sup>ăD1,I1ą, i.e., AăDztcu,Iztcuą is insufficient and AăD1ztcu,I1ztcuą is sufficient. As AăD1,I1<sup>ą</sup> Ď AăD,Ią, we have AăD1ztcu,I1ztcuą Ď AăDztcu,Iztcuą. By Proposition 2, if the reduction AăD1ztcu,I1ztcuą is sufficient then AăDztcu,Iztcuą is also sufficient.

**Definition 7 (reduction core).** Let <sup>A</sup>ăD,I<sup>ą</sup> be a sufficient reduction, <sup>ρ</sup> a witness run of the sufficiency (i.e., reachability of <sup>L</sup><sup>T</sup> on <sup>A</sup>ăD,Ią), and <sup>π</sup> the path corresponding to <sup>ρ</sup>. Futhermore, let <sup>M</sup>pπq " <sup>l</sup>0, e1,...,en, l<sup>n</sup> be the path corresponding to <sup>π</sup> on the original TA <sup>A</sup> defined as in (4). The reduction core of <sup>A</sup>ăD,I<sup>ą</sup> w.r.t. <sup>ρ</sup> is the reduction <sup>A</sup>ăD1,I1<sup>ą</sup> where <sup>D</sup><sup>1</sup> " tpe, ϕq|pe, ϕq P <sup>D</sup>^<sup>e</sup> " <sup>e</sup><sup>i</sup> for some <sup>1</sup> <sup>ď</sup> <sup>i</sup> <sup>ď</sup> <sup>n</sup><sup>u</sup> and <sup>I</sup><sup>1</sup> " tpl, ϕq|pl, ϕq P <sup>I</sup> ^ <sup>l</sup> " <sup>l</sup><sup>i</sup> for some <sup>0</sup> <sup>ď</sup> <sup>l</sup> <sup>ď</sup> <sup>n</sup>u.

Intuitively, the reduction core of <sup>A</sup>ăD,I<sup>ą</sup> w.r.t. <sup>ρ</sup> reduces from <sup>A</sup> only the simple clock constraints that appear on the witness path in A.

**Proposition 4.** Let <sup>A</sup>ăD,I<sup>ą</sup> be a sufficient reduction, <sup>ρ</sup> the witness of reachability of <sup>L</sup><sup>T</sup> on <sup>A</sup>ăD,Ią, and <sup>A</sup>ăD1,I1<sup>ą</sup> the reduction core of <sup>A</sup>ăD,I<sup>ą</sup> w.r.t. <sup>ρ</sup>. Then AăD1,I1<sup>ą</sup> is a sufficient reduction and AăD1,I1<sup>ą</sup> Ď AăD,Ią.

Proof. By Definition 7, <sup>D</sup><sup>1</sup> <sup>Ď</sup> <sup>D</sup> and <sup>I</sup><sup>1</sup> <sup>Ď</sup> <sup>I</sup>, thus <sup>A</sup>ăD1,I1<sup>ą</sup> <sup>Ď</sup> <sup>A</sup>ăD,Ią. As for the sufficiency of AăD1,I1ą, we only sketch the proof. Intuitively, both AăD,I<sup>ą</sup> and <sup>A</sup>ăD1,I1<sup>ą</sup> originate from <sup>A</sup> by only removing some simple clock constraints (DYI, and <sup>D</sup><sup>1</sup> <sup>Y</sup> <sup>I</sup><sup>1</sup> , respectively), i.e., the graph structure of AăD,I<sup>ą</sup> and AăD1,I1<sup>ą</sup> is the same, however, some corresponding paths of AăD,I<sup>ą</sup> and AăD1,I1<sup>ą</sup> differ in the constraints that appear on the paths. By Definition 7, the path π that corresponds to the witness run <sup>ρ</sup> of <sup>A</sup>ăD,I<sup>ą</sup> is also a path of <sup>A</sup>ăD1,I1ą. Since realizability of a path depends only on the constraints along the path, if π is realizable on <sup>A</sup>ăD,I<sup>ą</sup> then <sup>π</sup> is also realizable on <sup>A</sup>ăD1,I1ą.

Our approach for shrinking a sufficient reduction N is shown in Algorithm 2. The algorithm iteratively maintains a sufficient reduction AăD,I<sup>ą</sup> and a set <sup>X</sup> of known critical constraints for <sup>A</sup>ăD,Ią. Initially, <sup>A</sup>ăD,I<sup>ą</sup> " <sup>N</sup> and <sup>X</sup> " <sup>H</sup>. In each iteration, the algorithm picks a simple clock constraint <sup>c</sup> P p<sup>D</sup> <sup>Y</sup> <sup>I</sup>qz<sup>X</sup> and checks the reduction <sup>A</sup>ăDztcu,Iztcuą for sufficiency. If <sup>A</sup>ăDztcu,Iztcuą is insufficient, the algorithm adds <sup>c</sup> to <sup>X</sup>. Otherwise, if <sup>A</sup>ăDztcu,Iztcuą is sufficient, the algorithm obtains a witness run ρ of the sufficiency from the verifier and reduces AăD,I<sup>ą</sup> to the corresponding reduction core. The algorithm terminates when <sup>p</sup><sup>D</sup> <sup>Y</sup> <sup>I</sup>q " <sup>X</sup>. An invariant of the algorithm is that every <sup>c</sup> <sup>P</sup> <sup>X</sup> is critical for <sup>A</sup>ăD,Ią. Thus, when <sup>p</sup><sup>D</sup> <sup>Y</sup> <sup>I</sup>q " <sup>X</sup>, <sup>A</sup>ăD,I<sup>ą</sup> is an MSR (Proposition 3).

Note that the algorithm also uses the set I of known insufficient reductions. In particular, before calling a verifier to check a reduction for sufficiency (line 4), the algorithm first checks (in a lazy manner) whether the reduction is already known to be insufficient. Also, whenever the algorithm determines a reduction <sup>A</sup>ăDztcu,Iztcuą to be insufficient, it adds <sup>A</sup>ăDztcu,Iztcuą and every <sup>N</sup>, <sup>N</sup> <sup>Ď</sup> <sup>A</sup>ăDztcu,Iztcuą to <sup>I</sup> (by Proposition 2, every such <sup>N</sup> is also insufficient).

#### **3.3 Finding a Seed**

We now describe the procedure findSeed. The input is the latest identified MSR <sup>M</sup>, the set <sup>M</sup> of known MSRs, and the set <sup>I</sup> of known insufficient reductions. The output is a seed, i.e., a sufficient reduction <sup>N</sup> such that <sup>|</sup>N|ă|M|, or null if there is no seed. Let us denote by CAND the set of all candidates on a seed, i.e., CAND " t<sup>N</sup> <sup>P</sup> <sup>R</sup><sup>A</sup> | |N|ă|M|u. A brute-force approach would be to check individual reductions in CAND for sufficiency until a sufficient one is found, however, this can be practically intractable since <sup>|</sup>CAND| " <sup>ř</sup><sup>|</sup>M<sup>|</sup> i"1 `|ΨpΔqYΨpInvq| i´1 ˘ .

We provide three observations to prune the set CAND of candidates that need to be tested for being a seed. The first observation exploits the set I of already known insufficient reductions: no <sup>N</sup> <sup>P</sup> <sup>I</sup> can be a seed. The second observation exploits the set M of already known MSRs. By the definition of an MSR, for every <sup>M</sup><sup>1</sup> <sup>P</sup> <sup>M</sup> and every <sup>N</sup> such that <sup>N</sup> <sup>Ĺ</sup> <sup>M</sup><sup>1</sup> , the reduction N is necessarily insufficient and hence cannot be a seed. The third observation is stated below:

**Observation 1.** For every sufficient reduction <sup>N</sup> <sup>P</sup> CAND there exists a sufficient reduction <sup>N</sup><sup>1</sup> <sup>P</sup> CAND such that <sup>N</sup> <sup>Ď</sup> <sup>N</sup><sup>1</sup> and <sup>|</sup>N<sup>1</sup> |"|M| ´ <sup>1</sup>.

Proof. If <sup>|</sup>N|"|M| ´ 1, then <sup>N</sup> " <sup>N</sup><sup>1</sup> . For the other case, when <sup>|</sup>N|ă|M| ´ 1, let <sup>N</sup> " <sup>A</sup>ăD<sup>N</sup> ,I<sup>N</sup> <sup>ą</sup> and <sup>M</sup> " <sup>A</sup>ăDM,IMą. We construct <sup>N</sup><sup>1</sup> " <sup>A</sup>ăDN<sup>1</sup> ,IN<sup>1</sup> <sup>ą</sup> by adding arbitrary p|M|´|N|q´1 simple clock constraint from <sup>p</sup>D<sup>M</sup> <sup>Y</sup>I<sup>M</sup>qzpD<sup>N</sup> <sup>Y</sup> <sup>I</sup><sup>N</sup> <sup>q</sup> to <sup>p</sup>D<sup>N</sup> <sup>Y</sup> <sup>I</sup><sup>N</sup> <sup>q</sup>, i.e., <sup>D</sup><sup>N</sup> <sup>Y</sup> <sup>I</sup><sup>N</sup> <sup>Ď</sup> <sup>D</sup>N<sup>1</sup> <sup>Y</sup> <sup>I</sup>N<sup>1</sup> Ď pD<sup>M</sup> <sup>Y</sup> <sup>I</sup><sup>M</sup> <sup>Y</sup> <sup>D</sup><sup>N</sup> <sup>Y</sup> <sup>I</sup><sup>N</sup> <sup>q</sup> and |DN<sup>1</sup> YIN<sup>1</sup> |"|M|´1. By definition of CAND, <sup>N</sup><sup>1</sup> <sup>P</sup> CAND. Moreover, since <sup>N</sup> <sup>Ĺ</sup> <sup>N</sup><sup>1</sup> and N is sufficient, then N<sup>1</sup> is also sufficient (Proposition 2).

Based on the above observations, we build a set C of indispensable candidates on seeds that need to be tested for sufficiency:

$$\mathcal{C} = \{ N \in \mathcal{R}\_{\mathcal{A}} \, | \, N \notin \mathcal{T} \land \forall M' \in \mathcal{M}. N \not\subseteq M' \land | N| = |M| - 1 \}\tag{5}$$

The procedure findSeed, shown in Algorithm 3, in each iteration picks a reduction <sup>N</sup> <sup>P</sup> <sup>C</sup> and checks it for sufficiency (via the verifier). If <sup>N</sup> is sufficient, findSeed returns N as the seed. Otherwise, when N is insufficient, the algorithm first attempts to enlarge N into an insufficient reduction E such that N Ď E. By Proposition 2, every reduction N<sup>1</sup> such that N<sup>1</sup> Ď E is also insufficient, thus all these reductions are subsequently added to I and hence removed from C (note that this includes also <sup>N</sup>). If <sup>C</sup> becomes empty, then there is no seed.

The purpose of enlarging <sup>N</sup> into <sup>E</sup> is to quickly prune the candidate set <sup>C</sup>. We could just add all the insufficient reductions <sup>t</sup>N<sup>1</sup> <sup>|</sup> <sup>N</sup><sup>1</sup> <sup>Ď</sup> <sup>N</sup><sup>u</sup> to <sup>I</sup>, but note that |tN<sup>1</sup> <sup>|</sup> <sup>N</sup><sup>1</sup> <sup>Ď</sup> <sup>E</sup>u| is exponentially larger than |tN<sup>1</sup> <sup>|</sup> <sup>N</sup><sup>1</sup> <sup>Ď</sup> <sup>N</sup>u| w.r.t. <sup>|</sup>E|´|N|. The enlargement, shown in Algorithm 4, works almost dually to shrinking. Let <sup>N</sup> " <sup>A</sup>ăD,Ią. The algorithm attempts to one by one add the constraints from <sup>Ψ</sup>pΔqz<sup>D</sup> and <sup>Ψ</sup>pInvqz<sup>I</sup> to <sup>D</sup> and <sup>I</sup>, respectively, checking each emerged reduction for sufficiency, and keeping only the changes that preserve AăD,I<sup>ą</sup> to be insufficient.

### **3.4 Representation of** *I* **and** *C*

The final piece of the puzzle is how to efficiently manipulate with the sets I and C. In particular, we are adding reductions to I and C, removing reductions from C, checking if a reduction belongs to I, checking if C is empty, and picking a reduction from C. The problem is that the size these sets can be expontential w.r.t. <sup>|</sup>ΨpΔq Y <sup>Ψ</sup>pInvq| (there are exponentially many reductions), and thus, it is practically intractable to maintain the sets explicitly. Instead, we use a symbolic representation. Given a TA <sup>A</sup> with simple clock constraints <sup>Ψ</sup>pΔq " tpe1, ϕ1q,...,pep, ϕpqu and <sup>Ψ</sup>pInvq " tpl1, ϕ1q,...,plq, ϕqqu, we introduce two sets <sup>X</sup> " tx1,...,xp<sup>u</sup> and <sup>Y</sup> " ty1,...,yq<sup>u</sup> of Boolean variables. Note that every valuation of the variables <sup>X</sup> <sup>Y</sup> <sup>Y</sup> one-to-one maps to the reduction <sup>A</sup>ăD,I<sup>ą</sup> such that <sup>p</sup>ei, ϕiq P <sup>D</sup> iff <sup>x</sup><sup>i</sup> is assigned True and <sup>p</sup>l<sup>j</sup> , ϕ<sup>j</sup> q P <sup>I</sup> iff <sup>y</sup><sup>j</sup> is assigned True.

The set I is gradually built during the whole computation of Algorithm 1. To represent <sup>I</sup>, we build a Boolean formula <sup>I</sup> such that a reduction <sup>N</sup> **does not** belong to <sup>I</sup> iff <sup>N</sup> **does** correspond to a model of <sup>I</sup>. Initially, <sup>I</sup> " H, thus <sup>I</sup> " True. To add an insufficient reduction <sup>A</sup>ăD,I<sup>ą</sup> and all reductions <sup>N</sup>, <sup>N</sup> <sup>Ď</sup> <sup>A</sup>ăD,Ią, to <sup>I</sup>, we add to <sup>I</sup> the clause <sup>p</sup> Ž <sup>p</sup>ei,ϕiqPΨpΔqz<sup>D</sup> <sup>x</sup>iq\_p<sup>Ž</sup> <sup>p</sup>l<sup>j</sup> ,ϕ<sup>j</sup> qPΨpInvqz<sup>I</sup> <sup>y</sup><sup>j</sup> <sup>q</sup>.

The set C is repeatedly built during each call of the procedure findSeed based on Eq. 5 and it is encoded via a Boolean formula C such that every model of <sup>C</sup> **does** correspond to a reduction <sup>N</sup> <sup>P</sup> <sup>C</sup> :

$$\mathbb{C} = \mathbb{I} \land \bigwedge\_{\mathcal{A}\_{} \in \mathcal{M}} ((\bigvee\_{i\_i, \varphi\_i} x\_i) \lor (\bigvee\_{(l\_j, \varphi\_j) \in \mathbb{V}(Inv)} y\_j)) \land \mathtt{true} \mathbf{s} (|\mathbb{M}| - 1) \tag{6}$$

where truesp|M| ´ <sup>1</sup><sup>q</sup> is a cardinality encoding forcing that exactly <sup>|</sup>M| ´ 1 variables from <sup>X</sup> <sup>Y</sup><sup>Y</sup> are set to True. To check if <sup>C</sup> " H or to pick a reduction <sup>N</sup> <sup>P</sup> <sup>C</sup>, we ask a SAT solver for a model of C. To remove an insufficient reduction from <sup>C</sup>, we update the formula <sup>I</sup> (and thus also <sup>C</sup>) as described above.

#### **3.5 Related Work**

Although the concept of minimal sufficient reductions (MSRs) is novel in the context of timed automata, similar concepts appear in other areas of computer


 **while** tN P R<sup>A</sup> | N R I ^ @M<sup>1</sup> P M. N Ę M<sup>1</sup> ^ |N|"|M| ´ 1u‰H **do** N Ð pick from tN P R<sup>A</sup> | N R I ^ @M<sup>1</sup> P M. N Ę M<sup>1</sup> ^ |N|"|M| ´ 1u **if** N *is sufficient* **then return** N, I **else** I Ð I Y tN<sup>1</sup> P R<sup>A</sup> | N<sup>1</sup> Ď enlarge(N) u **return** null, I

**Algorithm 4:** enlarge(AăD,Ią)

 **foreach** c P pΨpΔq Y ΨpInvqqzpD Y Iq **do if** <sup>c</sup> <sup>P</sup> <sup>Ψ</sup>pΔ<sup>q</sup> *and* <sup>A</sup>ăDYtcu,I<sup>ą</sup> *is sufficient* **then** <sup>D</sup> <sup>Ð</sup> <sup>D</sup> Y tc<sup>u</sup> **if** <sup>c</sup> <sup>P</sup> <sup>Ψ</sup>pInv<sup>q</sup> *and* <sup>A</sup>ăD,IYtcuą *is sufficient* **then** <sup>I</sup> <sup>Ð</sup> <sup>I</sup> Y tc<sup>u</sup> **return** AăD,I<sup>ą</sup>

science. For example, see minimal unsatisfiable subsets [15], minimal correction subsets [47], minimal inconsistent subsets [16,18], or minimal inductive validity cores [32]. All these concepts can be generalized as minimal sets over monotone predicates (MSMPs) [48,49]. The input is a reference set R and a monotone predicate **<sup>P</sup>** : <sup>P</sup>pRqÑt1, <sup>0</sup>u, and the goal is to find minimal subsets of <sup>R</sup> that satisfy the predicate. In the case of MSRs, the reference set is the set of all simple constraints <sup>Ψ</sup>pΔqYΨpInv<sup>q</sup> and, for every <sup>D</sup>Y<sup>I</sup> <sup>Ď</sup> <sup>Ψ</sup>pΔqYΨpInvq, the predicate is defined as **<sup>P</sup>**p<sup>D</sup> <sup>Y</sup>Iq " 1 iff <sup>A</sup>ăD,I<sup>ą</sup> is sufficient. Many algorithms were proposed (e.g., [45,14,19,22,20,47,21,37,32,23]) for finding MSMPs for particular instances of the MSMP problem. However, the algorithms are dedicated to the particular instances and extensively exploit specific properties of the instances (such as we exploit reduction cores in case of MSRs). Consequently, the algorithms either cannot be used for finding MSRs, or they would be rather inefficient.

### **4 Synthesis of Relaxation Parameters**

The main objective of this study is to make the target locations L<sup>T</sup> of a given TA <sup>A</sup> " pL, l0, C, Δ, Inv<sup>q</sup> reachable by only modifying the constants of simple constraints of A. In the previous section, we presented an efficient algorithm to find a set of simple clock constraints <sup>D</sup> <sup>Ď</sup> <sup>Ψ</sup>pΔ<sup>q</sup> (1) (over transitions) and <sup>I</sup> <sup>Ď</sup> <sup>Ψ</sup>pInv<sup>q</sup> (2) (over locations) such that the target set is reachable when constraints <sup>D</sup> and <sup>I</sup> are removed from <sup>A</sup>. In other words, <sup>L</sup><sup>T</sup> is reachable on <sup>A</sup>ăD,Ią. Consequently, a verifier generates a finite run <sup>ρ</sup><sup>1</sup> <sup>L</sup><sup>T</sup> " <sup>p</sup>l0, **<sup>0</sup>**qÑ<sup>d</sup><sup>0</sup> <sup>p</sup>l1, v1qÑ<sup>d</sup><sup>1</sup> ... <sup>Ñ</sup><sup>d</sup>n´<sup>1</sup> <sup>p</sup>ln, vn<sup>q</sup> of <sup>A</sup>ăD,I<sup>ą</sup> such that <sup>l</sup><sup>n</sup> <sup>P</sup> <sup>L</sup><sup>T</sup> . Let <sup>π</sup><sup>1</sup> <sup>L</sup><sup>T</sup> " l0, e<sup>1</sup> 1, l1,...,e<sup>1</sup> <sup>n</sup>´<sup>1</sup>, l<sup>n</sup> be the corresponding path on <sup>A</sup>ăD,Ią, i.e., <sup>π</sup><sup>1</sup> <sup>L</sup><sup>T</sup> is realizable on <sup>A</sup>ăD,I<sup>ą</sup> due to the delay sequence <sup>d</sup>0, d1,...,d<sup>n</sup>´<sup>1</sup> and the resulting run is ρ1 <sup>L</sup><sup>T</sup> . The corresponding path on the original TA A defined as in (4) is:

$$
\pi'\_{L\_T} = M(\pi\_{L\_T}), \text{ and } \pi\_{L\_T} = l\_0, e\_1, l\_1, \dots, e\_{n-1}, l\_n,\tag{7}
$$

While π<sup>1</sup> <sup>L</sup><sup>T</sup> is realizable on <sup>A</sup>ăD,Ią, <sup>π</sup>L<sup>T</sup> is not realizable on <sup>A</sup> since <sup>L</sup><sup>T</sup> is not reachable on A. We present an MILP based method to find a relaxation valuation **<sup>r</sup>** : <sup>D</sup>Y<sup>I</sup> <sup>Ñ</sup> <sup>N</sup>Yt8u such that the path induced by <sup>π</sup>L<sup>T</sup> is realizable on <sup>A</sup>ăD,I,**r**ą.

Given an automaton path <sup>π</sup> " <sup>l</sup>0, e1, l1,...,en´1, l<sup>n</sup> with <sup>e</sup><sup>i</sup> " pli´1, λi, φi, li<sup>q</sup> for each <sup>i</sup> " <sup>1</sup>,...,n´1, we introduce real valued delay variables <sup>δ</sup>0,...,δn´<sup>1</sup> that represent the time spent in each location along the path. Since clocks measure the time passed since their last resets, for a fixed path, a clock on a given constraint (invariant or guard) can be mapped to a sum of delay variables:

$$\Gamma(x,\pi,i) = \delta\_k + \delta\_{k+1} + \dots + \delta\_{i-1} \text{ where } k = \max\{\{m \mid x \in \lambda\_m, m < i\} \cup \{0\}\} \tag{8}$$

The value of clock <sup>x</sup> equals to <sup>Γ</sup>px, π, i<sup>q</sup> on the i-th transition <sup>e</sup><sup>i</sup> along <sup>π</sup>. In (8), k is the index of the transition where x is last reset before e<sup>i</sup> along π, and it is 0 if it is not reset. <sup>Γ</sup>p0, π, i<sup>q</sup> is defined as 0 for notational convenience.

Guards. For transition <sup>e</sup>i, each simple constraint <sup>ϕ</sup> " <sup>x</sup> ´ <sup>y</sup> " <sup>c</sup> <sup>P</sup> <sup>S</sup>pφi<sup>q</sup> on the guard φ<sup>i</sup> is mapped to the new delay variables as:

$$
\Gamma(x,\pi,i) - \Gamma(y,\pi,i) \sim c + p\_{e\_i,\varphi} \tag{9}
$$

where <sup>p</sup><sup>e</sup>i,ϕ is a new integer valued relaxation variable if <sup>p</sup>ei, ϕq P <sup>D</sup>, otherwise it is set to 0.

Invariants. Each clock constraint <sup>ϕ</sup> " <sup>x</sup> ´ <sup>y</sup> " <sup>c</sup> <sup>P</sup> <sup>S</sup>pInvpliqq of the invariant of location l<sup>i</sup> is mapped to arriving (10) and leaving (11) constraints over the delay variables, since the invariant should be satisfied when arriving and leaving the location (and hence, due to the invariant convexity, also in the location).

$$\begin{aligned} \Gamma(x,\pi,i) \cdot \mathbf{I}(x \notin \lambda\_i) - \Gamma(y,\pi,i) \cdot \mathbf{I}(y \notin \lambda\_i) &\sim c + p\_{l\_i,\varphi\_i} \quad \text{if } i > 0 \text{(arriving)} \quad (10) \\ \Gamma(x,\pi,i+1) - \Gamma(y,\pi,i+1) &\sim c + p\_{l\_i,\varphi\_i} \quad \text{(leaving)} \quad (11) \end{aligned}$$

where **I** is a binary function mapping true to 1 and false to 0, p<sup>l</sup>i,ϕ<sup>i</sup> is a new integer valued variable if <sup>p</sup>li, ϕiq P <sup>I</sup>, otherwise it is set to 0.

Finally, we define an MILP (12) for the path π. The constraint relaxation variables <sup>t</sup>pl,ϕ | pl, ϕq P <sup>I</sup><sup>u</sup> and <sup>t</sup>pe,ϕ | pe, ϕq P <sup>D</sup><sup>u</sup> (integer valued), and the delay variables <sup>δ</sup>0,...,δ<sup>n</sup>´<sup>1</sup> (real valued) are the decision variables of the MILP.

$$\text{minimize } \sum\_{\{l,\varphi\} \in I} p\_{l,\varphi} + \sum\_{\{e,\varphi\} \in D} p\_{e,\varphi} \tag{12}$$

subject to (9) for each <sup>i</sup> " <sup>1</sup>,...,n ´ <sup>1</sup>, and <sup>x</sup> ´ <sup>y</sup> " <sup>c</sup> <sup>P</sup> <sup>S</sup>pφi<sup>q</sup>

(10) for each <sup>i</sup> " <sup>1</sup>, . . . , n, and <sup>x</sup> ´ <sup>y</sup> " <sup>c</sup> <sup>P</sup> <sup>S</sup>pInvpliqq

(11) for each <sup>i</sup> " <sup>0</sup>,...,n ´ <sup>1</sup>, and <sup>x</sup> ´ <sup>y</sup> " <sup>c</sup> <sup>P</sup> <sup>S</sup>pInvpliqq

<sup>p</sup>l,ϕ <sup>P</sup> <sup>Z</sup>` for each <sup>p</sup>l, ϕq P I, and <sup>p</sup>e,ϕ <sup>P</sup> <sup>Z</sup>` for each <sup>p</sup>e, ϕq P <sup>D</sup>

Let <sup>t</sup>p‹ l,ϕ | pl, ϕq P <sup>I</sup>u, <sup>t</sup>p‹ e,ϕ | pe, ϕq P <sup>D</sup>u, and <sup>δ</sup>‹ 0,...,δ‹ <sup>n</sup>´<sup>1</sup> denote the solution of MILP (12). Define a relaxation valuation **r** with respect to the solution as

$$\mathbf{r}(l,\varphi) = p\_{l,\varphi}^{\star} \text{ for each } (l,\varphi) \in I, \quad \mathbf{r}(e,\varphi) = p\_{e,\varphi}^{\star} \text{ for each } (e,\varphi) \in D. \tag{13}$$

**Theorem 1.** Let <sup>A</sup> " pL, l0, C, Δ, Inv<sup>q</sup> be a timed automaton, <sup>π</sup> " <sup>l</sup>0, e1, l1,..., <sup>e</sup>n, l<sup>n</sup> be a finite path of <sup>A</sup>, and <sup>D</sup> <sup>Ă</sup> <sup>Ψ</sup>pΔq, <sup>I</sup> <sup>Ă</sup> <sup>Ψ</sup>pI<sup>q</sup> be guard and invariant constraint sets. If the MILP constructed from <sup>A</sup>, <sup>π</sup>, <sup>D</sup> and <sup>I</sup> as defined in (12) is feasible, then <sup>l</sup><sup>n</sup> is reachable on <sup>A</sup>ăD,I,**r**<sup>ą</sup> with **<sup>r</sup>** as defined in (13).

Proof sketch Let <sup>t</sup>p‹ l,ϕ | pl, ϕq P <sup>I</sup>u, <sup>t</sup>p‹ e,ϕ | pe, ϕq P <sup>D</sup>u, and <sup>δ</sup>‹ 0,...,δ‹ <sup>n</sup>´<sup>1</sup> be the optimal solution of MILP (12). Define clock value sequence v0, v1,...,v<sup>n</sup> with respect to the path <sup>π</sup> with <sup>e</sup><sup>i</sup> " pli´1, λi, φi, li<sup>q</sup> and the delay sequence δ‹ 0,...,δ‹ <sup>n</sup>´<sup>1</sup> iteratively as <sup>v</sup><sup>i</sup> " **<sup>0</sup>** and <sup>v</sup><sup>i</sup> " pvi´<sup>1</sup> ` <sup>δ</sup>‹ <sup>i</sup>´<sup>1</sup>qrλ<sup>i</sup> :" <sup>0</sup><sup>s</sup> for each <sup>i</sup> " <sup>1</sup>,...,n. Along the path <sup>π</sup>, <sup>v</sup><sup>i</sup> is consistent with <sup>Γ</sup>p¨, π, i<sup>q</sup> (8) such that

$$(a) \ v\_i(x) = \Gamma(x, \pi, i). I(x \notin \lambda\_i) \qquad \text{and} \quad b) \ v\_i(x) + \delta\_i^\star = \Gamma(x, \pi, i+1) \tag{14}$$

MILP (12) constraints and (14) imply that the path <sup>M</sup>pπ<sup>q</sup> that end in <sup>l</sup><sup>n</sup> is realizable on <sup>A</sup>ăD,I,**r**<sup>ą</sup> via the delay sequence <sup>δ</sup>‹ 0,...,δ‹ <sup>n</sup>´1.

A linear programming (LP) based approach was used in [27] to generate the optimal delay sequence for a given path of a weighted timed automata. In our case, the optimization problem is in MILP form since we find an integer valued relaxation valuation (**r**) in addition to the delay variables.

Recall that we construct relaxation sets D and I via Algorithm 1, and define π<sup>L</sup><sup>T</sup> (7) that reach L<sup>T</sup> such that the corresponding path π<sup>1</sup> <sup>L</sup><sup>T</sup> is realizable on <sup>A</sup>ăD,Ią. Then, we define MILP (12) with respect to <sup>π</sup><sup>L</sup><sup>T</sup> , <sup>D</sup> and <sup>I</sup>, and define **r** (13) according to the optimal solution. Note that this MILP is always feasible since π<sup>1</sup> <sup>L</sup><sup>T</sup> is realizable on <sup>A</sup>ăD,Ią. Finally, by Theorem 1, we conclude that <sup>L</sup><sup>T</sup> is reachable on AăD,I,**r**ą.

Example 3. For the TA shown in Fig. 1, Algorithm <sup>1</sup> generates <sup>A</sup>ăD,I<sup>ą</sup> with <sup>D</sup> " tpe5, x <sup>ě</sup> <sup>25</sup>qu and <sup>I</sup> " tpl3, u <sup>ď</sup> <sup>26</sup>qu such that <sup>π</sup> " <sup>l</sup>0, e1, l1, e2, l2, e3, l1, e4, l3, e5, <sup>l</sup><sup>4</sup> is realizable on <sup>A</sup>ăD,Ią. The MILP is constructed for <sup>π</sup>, <sup>D</sup> and <sup>I</sup> with decision variables <sup>p</sup><sup>e</sup>5,xě25, <sup>p</sup><sup>l</sup>3,uď26, <sup>δ</sup>0, δ1, δ2, δ3, δ<sup>4</sup> and <sup>δ</sup><sup>5</sup> as in (12). The solution is <sup>p</sup><sup>e</sup>5,xě<sup>25</sup> " 3, <sup>p</sup><sup>l</sup>3,uď<sup>26</sup> " 5, and the delay sequence is 9, <sup>4</sup>, <sup>0</sup>, <sup>9</sup>, <sup>9</sup>, 0. Consequently, <sup>l</sup><sup>4</sup> is reachable on <sup>A</sup>ăD,I,**r**<sup>ą</sup> with **<sup>r</sup>**pe5, x <sup>ě</sup> <sup>25</sup>q " 3 and **<sup>r</sup>**pl3, u <sup>ď</sup> <sup>26</sup>q " 5.

### **5 Case Study**

We implemented the proposed reduction and relaxation methods in a tool called Tamus. We use UPPAAL for sufficiency checks and witness computation, and CBC solver from Or-tools library [50] for the MILP part. All experiments were run on a laptop with Intel i5 quad core processor at 2.5 GHz and 8 GB ram. The tool and used benchmarks are available at https://github.com/jar-ben/tamus.

As discussed in Section 1, an alternative approach to solve our problem (Problem 1) is to parameterize each simple clock constraint of the TA. Then, we can run a parameter synthesis tool on the parameterized TA to identify the set of all possible valuations of the parameters for which the TA satisfies the reachability property. Subsequently, we can choose the valuations that assign non-zero values (i.e., relax) to the minimum number of parameters, and out of these, we

**Table 1.** Results for the scheduler TA, where |Ψ|"|ΨpΔq Y ΨpIq| is the total number of constraints, d " |D Y U| is the minimum MSR size, v is the number of reachability checks, t is the computation time in seconds (including the reachability checks), and c<sup>m</sup> is the optimal cost of (12).


can choose the one with a minimum cumulative change of timing constants. In our experimental evaluation, we evaluate a state-of-the-art parameter synthesis tool called Imitator [9] to run such analysis. Although Imitator is not tailored for our problem, it allows us to measure the relative scalability of our approach compared to a well-established synthesis technique.

We used two collections of benchmarks: one is obtained from literature, and the other are crafted timed automata modeling a machine scheduling problem. All experiments were run using a time limit of 20 minutes per benchmark.

**Machine Scheduling** A scheduler automaton is composed of a set of paths from location <sup>l</sup><sup>0</sup> to location <sup>l</sup>1. Each path <sup>π</sup> " <sup>l</sup>0eklke<sup>k</sup>`<sup>1</sup> ...l<sup>k</sup>`M´1e<sup>k</sup>`<sup>M</sup>l<sup>1</sup> represents a particular scheduling scenario where an intermediate location, e.g. l<sup>i</sup> for <sup>i</sup> " k,..., k ` <sup>M</sup> ´ 1, belongs to a unique path (only one incoming and one outgoing transition). Thus, a TA that has p paths with M intermediate locations in each path has <sup>M</sup> ¨p`2 locations and <sup>p</sup><sup>M</sup> `1q¨<sup>p</sup> transitions. Each intermediate location represents a machine operation, and periodic simple clock constraints are introduced to mimic the limitations on the corresponding durations. For example, assume that the total time to use machines represented by locations <sup>l</sup><sup>k</sup>`<sup>i</sup> and <sup>l</sup><sup>k</sup>`i`<sup>1</sup> is upper (or lower) bounded by <sup>c</sup> for <sup>i</sup> " <sup>0</sup>, <sup>2</sup>,...,M ´ 2. To capture such a constraint with a period of <sup>t</sup> " 2, a new clock <sup>x</sup> is introduced and it is reset and checked on every t th transition along the path, i.e., for every <sup>m</sup> P t<sup>i</sup> ¨ <sup>t</sup> ` <sup>k</sup> <sup>|</sup> <sup>i</sup> ¨ <sup>t</sup> <sup>ď</sup> <sup>M</sup> ´ <sup>1</sup>u, let <sup>e</sup><sup>m</sup> " plm, λm, φm, l<sup>m</sup>`<sup>1</sup>q, add <sup>x</sup> to <sup>λ</sup>m, set <sup>φ</sup><sup>m</sup> :" <sup>φ</sup><sup>m</sup> ^ <sup>x</sup> <sup>ď</sup> <sup>c</sup> (<sup>x</sup> <sup>ě</sup> <sup>c</sup> for lower bound). A periodic constraint is denoted by <sup>p</sup>t, c, "q, where <sup>t</sup> is its period, <sup>c</sup> is the timing constant, and " P tă, <sup>ď</sup>, <sup>ą</sup>, ěu. A set of such constraints are defined for each path to capture possible restrictions. In addition, a bound T on the total execution time is captured with the constraint <sup>x</sup> <sup>ď</sup> <sup>T</sup> on transition <sup>e</sup><sup>k</sup>`<sup>M</sup> over a clock <sup>x</sup> that is not reset on any transition. A realizable path to l<sup>1</sup> represents a feasible scheduling scenario, thus the target set is <sup>L</sup><sup>T</sup> " tl1u. We have generated 24 test cases. A test case <sup>A</sup>pc,p,M<sup>q</sup> represents a timed automaton with <sup>c</sup> P t3, <sup>5</sup>, <sup>7</sup><sup>u</sup> clocks, and <sup>p</sup> P t1, <sup>2</sup><sup>u</sup> paths with <sup>M</sup> P t12, <sup>18</sup>, <sup>24</sup>, <sup>30</sup><sup>u</sup> intermediate locations in each path. <sup>R</sup>c,i is the set of

**Table 2.** Experimental results for the benchmarks, where |Ψ|, d, v t and c<sup>m</sup> are as defined in Table 1, <sup>|</sup><sup>Ψ</sup> <sup>u</sup><sup>|</sup> is the number of constraints considered in the analysis and <sup>m</sup> is the number of mutated constraints. t <sup>I</sup> , t IT , t Ic and t ITc are the Imitator computation times, where c indicates that the early termination flag ("counterexample") is used, otherwise the largest set of parameters is searched, and T indicates that only the constraints from the MSR identified by Tamus are parametrized, otherwise all constraints from Ψ <sup>u</sup> are parametrized. to shows that the timeout limit is reached (20 min.). We ran the Imitator with the flag "incl". Note that when run with the flag "merge", the performance of Imitator increases on 2 benchmarks, however, it decreases on other 2 benchmarks.


periodic restrictions defined for the i th path of an automaton with c clocks:

$$\begin{aligned} R\_{3,1} &= \{ (2, 11, \geqslant), (3, 15, \leqslant) \} & R\_{3,2} &= \{ (4, 17, \geqslant), (5, 20, \leqslant) \} \\ R\_{5,1} &= R\_{3,1} \cup \{ (4, 21, \geqslant), (5, 25, \leqslant) \} & R\_{5,2} &= R\_{3,2} \cup \{ (8, 33, \geqslant), (9, 36, \leqslant) \} \\ R\_{7,1} &= R\_{5,1} \cup \{ (6, 31, \geqslant), (7, 35, \leqslant) \} & R\_{7,2} &= R\_{5,2} \cup \{ (12, 49, \geqslant), (12, 52, \leqslant) \} \end{aligned}$$

Note that <sup>A</sup>pc,2,M<sup>q</sup> emerges from <sup>A</sup>pc,1,M<sup>q</sup> by adding a path with restrictions <sup>R</sup>c,2.

Table 1 shows results achieved by Tamus on these models. Tamus solved all models and the hardest one <sup>A</sup>p7,1,30<sup>q</sup> took only 14.12 seconds. As expected, the computation time <sup>t</sup> increases with the number <sup>|</sup>Ψ<sup>|</sup> of simple clock constraints in the model. Moreover, the computation time highly correlates with the size d of the minimum MSR. Especially, if we compare two generic models Apc,1,M<sup>q</sup> and Apc,2,Mq, although Apc,2,M<sup>q</sup> has one more path and more constraints, Tamus is faster on Apc,2,M<sup>q</sup> since it quickly converges to the path with smaller MSRs.

Imitator solved <sup>A</sup>p3,1,12q, <sup>A</sup>p3,2,12q, <sup>A</sup>p3,1,18q, and <sup>A</sup>p5,1,12<sup>q</sup> within 0.08, 0.5, 61, and 67 seconds, and timeouted for the other models. In addition, we run Imitator with a flag "counterexample" that terminates the computation when a satisfying valuation is found. The use of this flag reduced the computation time for the aforementioned cases, and it allowed to solve two more models: Ap3,2,18<sup>q</sup> and Ap5,2,12q. However, using this flag, Imitator often did not provide a solution that minimizes the number of relaxed simple clock constraints.

**Benchmarks from Literature** We collected 10 example models from literature that include models with a safety specification that requires avoiding a set

of locations LA, and models with a reachability specification with a set of target locations L<sup>T</sup> as considered in this paper. In both cases, the original models satisfy the given specification. For the first case, we define L<sup>A</sup> as the target set and apply our method. Here, we find the minimal number of timing constants that should be changed to reach LA, i.e., to violate the original safety specification. For the second case, inspired from mutation testing [2], we change a number of constraints on the original model so that L<sup>T</sup> becomes unreachable. Eight of the examples are networks of TAs, and while a network of TAs can be represented as a single product TA and hence our approach can handle it, Tamus currently supports only MSR computation for networks of TA, but not MILP relaxation.

The results are shown in Table 2. Tamus computed a minimum MSR for all the models and also provided the MILP relaxation for the non-network models. Note that the bottle-neck of our approach is the MSR computation and especially the verifier calls; the MILP part always took only few milliseconds (including models from Table 1), thus we believe that it would be also the case for the networks of TAs. The base variant of Imitator that computes the set of all satisfying parameter valuations solved only 4 of the 10 models. When run with the early termination flag, Imitator solved 3 more models, however, as discussed above, the provided solutions might not be optimal. We have also evaluated a combination of Tamus and Imitator. In particular, we first run Tamus to compute a minimum MSR <sup>A</sup>ăD,Ią, then parameterized the constraints <sup>D</sup> <sup>Y</sup> <sup>I</sup> in the original TA A, and run Imitator on the parameterized TA. In this case, Imitator solved 9 out of 10 models. Moreover, we have the guarantee that we found the optimal solution: the MSR ensures that we relax the minimum number of simple clock constraints, and Imitator finds all satisfying parameterizations of the constraints hence also the one with minimum cumulative change of timing constants.

**Conclusion** In this work, we proposed the novel concept of a minimum MSR for a TA, that is a minimum set of simple constraints that need to be relaxed to satisfy a reachability specification. We developed efficient methods to find a minimum MSR, and presented an MILP based solution to tune these constraints. Our analysis on benchmarks showed that our tool Tamus can generate a minimum MSR within seconds even for large systems. In addition, we compared our results with Imitator and observed that Tamus scales much better. However, Tamus minimizes the cumulative change of the constraints from a minimum MSR by considering a single witness path. If the goal is to find a minimal relaxation globally, i.e., w.r.t. all witness paths for the MSR, we recommend to use the combined version of Tamus and Imitator, i.e., first run Tamus to find a minimum MSR, parametrize each constraint from the MSR and run Imitator to find all satisfying parameter valuations, including the global optimum.

**Acknowledgements** This research was supported in part by ERDF "Cyber-Security, CyberCrime and Critical Information Infrastructures Center of Excellence" (No. CZ.02.1.01{0.0{0.0{<sup>16</sup> <sup>019</sup>{0000822) and in part by the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 798482.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Iterative Bounded Synthesis for Efficient Cycle Detection in Parametric Timed Automata**<sup>∗</sup>

Etienne Andr´e ´ <sup>1</sup> , Jaime Arias<sup>2</sup> , Laure Petrucci<sup>2</sup> , and Jaco van de Pol3

<sup>1</sup> Universit´e de Lorraine, CNRS, Inria, LORIA, Nancy, France <sup>2</sup> LIPN, CNRS UMR 7030, Universit´e Sorbonne Paris Nord, Villetaneuse, France <sup>3</sup> Aarhus University, Aarhus, Denmark, jaco@cs.au.dk

**Abstract.** We study semi-algorithms to synthesise the constraints under which a Parametric Timed Automaton satisfies some liveness requirement. The algorithms traverse a possibly infinite parametric zone graph, searching for accepting cycles. We provide new search and pruning algorithms, leading to successful termination for many examples. We demonstrate the success and efficiency of these algorithms on a benchmark. We also illustrate parameter synthesis for the classical Bounded Retransmission Protocol. Finally, we introduce a new notion of completeness in the limit, to investigate if an algorithm enumerates all solutions.

**Keywords:** Parameter Synthesis, Liveness Properties, IMITATOR

### **1 Introduction**

Many critical devices and processes in our society are controlled by software, in which real-time aspects often play a crucial role. Timed Automata (TA [1]) are an important formalism to design and study real-time systems; they extend finite automata with real-valued clocks. Their success is based on the decidability of the basic analysis problems of checking reachability and liveness properties.

Precise timing information is often unknown during the design phase. Therefore, Parametric Timed Automata (PTA [2]) extend TA with parameters, representing unknown waiting times, deadlines, network speed, etc. A single PTA represents an infinite class of TA. To facilitate design exploration, parameter constraint synthesis aims at a description of all parameter values for which the system meets some requirement. Unfortunately, it is already undecidable to check if a PTA admits a parameter valuation for which a bad state can be reached [2,3].

In this paper, we study the parameter constraint synthesis problem for liveness properties of the full class of PTA. In particular, the goal is to compute the parameter valuations for which a Parametric Timed B¨uchi Automaton has a non-empty language. Note that this allows handling requirements in LTL and MITL [24]. We represent the solution concisely as a disjunction of conjunctions

<sup>∗</sup>This work is partially supported by projects CNRS-INS2I TrAVAIL, IFD SE-CReTS and ANR-NRF ProMiS (ANR-19-CE25-0015).

<sup>©</sup> The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 311–329, 2021. https://doi.org/10.1007/978-3-030-72016-2 17

of linear inequalities between the parameters (a set of convex polyhedra).

We will consider semi-algorithms that operate on the so-called parametric zone graph (PZG), where a parametric zone is a conjunction of linear inequalities over clock and parameter values. These semi-algorithms may not terminate since the PZG can be infinite. However, even in that case, we are interested in the soundness and completeness of the set of all enumerated solutions.

Our contributions to the parameter constraint synthesis for liveness of PTA are: 1) A definition of soundness and completeness for non-terminating algorithms. 2) A new synthesis algorithm, using bounded search with iterative deepening; this is the first algorithm that enumerates all accepting cycles in the possibly infinite PZG, in contrast to previous NDFS-based algorithms [25]. 3) An experimental benchmark, comparing the successful termination and runtime efficiency of all algorithms. 4) A case study on the Bounded Retransmission Protocol.

**Related Work.** Decidability for (subclasses of) PTA has been extensively studied [2,19,3]. We study the emptiness and related synthesis problem for Parametric Timed B¨uchi Automata with unrestricted use of rational parameters and real-valued clocks. In this general case, the model checking problem is undecidable [2] and therefore exact synthesis is out of reach (in contrast to the setting with bounded integers [20,11]). Decidability of liveness properties for a subclass of PTA, where the occurrence of parameters is restricted, is discussed in [8].

Our approach inherits basic techniques from Timed Automata, in particular the zone graph. For TA, the zone graph is finite after LU-abstraction [27,23,17]. Another technique prunes states that are subsumed by larger states. Subsumption must be applied with care, in order to preserve liveness properties [22,18].

Previous semi-algorithms were based on Nested Depth-First Search (NDFS). They search the (possibly infinite) parametric zone graph (PZG) for accepting cycles. Their zones are projected onto the parameters and accumulated into the global constraint. The basic cumulative algorithm [11] prunes states whose projected zone is already included in the accumulated constraint. The cumulative algorithm was extended with subsumption and layering for PTA [25]. The problem with all NDFS-based algorithms is that the computation can diverge in one branch, missing solutions for accepting cycles in other branches forever.

Our main improvement is a bounded approach, which can be combined with breadth- and depth-first search. We check for accepting cycles up to a certain bound, and keep increasing the bound to achieve completeness in the limit. Eventually, this will enumerate all parametric constraints corresponding to all accepting cycles in the PZG. Sometimes, the combination of bounded search and subsumption can even identify infinite paths that do not form a cycle, but this is not guaranteed. A previous proposal for Bounded Model Checking for PTA [21] considers the region graph and has not been implemented. We will provide several small illustrative examples inspired by the invited talk [26].

To evaluate our algorithms, we implemented them in the IMITATOR toolset [6], extending its functionality from reachability to liveness properties. This way, we can reuse its PTA benchmark [4]. We also reimplemented the algorithms of [11,25] in a single NDFS framework. We illustrate our method on the Bounded Retransmission Protocol (BRP). We synthesize parameter constraints for liveness properties of BRP for the first time. Our constraints are more liberal than the constraints reported in previous work [14,19].

### **2 PTA, Parametric Zone Graphs and Accepted Runs**

Let X be a set of real-valued clocks (e.g. x, y) and let P be a set of rational parameters (e.g. p, q). A linear term over parameters (plt) is an expression of the form <sup>i</sup> <sup>α</sup>ip<sup>i</sup> <sup>+</sup> <sup>β</sup>, where <sup>p</sup><sup>i</sup> <sup>∈</sup> <sup>P</sup>, and coefficients <sup>α</sup>i, β <sup>∈</sup> <sup>Q</sup>. A (diagonal) inequality is of the form <sup>x</sup>1−x<sup>2</sup> plt, with <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>X</sup>∪{0} and ∈ {<, <sup>≤</sup>, <sup>=</sup>, <sup>≥</sup>, >}. Examples are <sup>x</sup> <sup>−</sup> <sup>y</sup> <sup>≤</sup> <sup>2</sup><sup>p</sup> <sup>+</sup> <sup>q</sup>, x>q <sup>−</sup> 1 and 2 <sup>≤</sup> <sup>p</sup>. A (convex) constraint (or zone <sup>Z</sup>) is a conjunction of inequalities. We write <sup>C</sup> for the set of zones.

We define a PTA <sup>A</sup> = (L, 0, F, I, E), where <sup>L</sup> is a finite set of locations, <sup>0</sup> <sup>∈</sup> <sup>L</sup> is the initial location and <sup>F</sup> <sup>⊆</sup> <sup>L</sup> is the set of accepting locations. <sup>I</sup> : <sup>L</sup> → C denotes an invariant for each location and E is a set of transitions of the form (, g, R, ), with source <sup>∈</sup> <sup>L</sup>, target <sup>∈</sup> <sup>L</sup>, guard <sup>g</sup> ∈ C and clock reset <sup>R</sup> <sup>⊆</sup> <sup>X</sup>.

The concrete semantics of a PTA is defined in terms of valuations. A parameter valuation is a function <sup>v</sup> : <sup>P</sup> <sup>→</sup> <sup>Q</sup>≥<sup>0</sup> and a clock valuation is a function <sup>w</sup> : <sup>X</sup> <sup>→</sup> <sup>R</sup>≥0. Let <sup>d</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> be a delay, then we define

**Fig. 1.** PTA A<sup>1</sup>

the clock valuation <sup>w</sup> <sup>+</sup> <sup>d</sup> such that (<sup>w</sup> <sup>+</sup> <sup>d</sup>)(x) := <sup>w</sup>(x) + <sup>d</sup>. Let <sup>R</sup> <sup>⊆</sup> <sup>X</sup> be a clock reset, then we define the clock valuation <sup>w</sup>[R](x) := 0 if <sup>x</sup> <sup>∈</sup> <sup>R</sup> and <sup>w</sup>(x) otherwise. We write **<sup>0</sup>** for the clock valuation s.t. <sup>∀</sup><sup>x</sup> <sup>∈</sup> <sup>X</sup> : **<sup>0</sup>**(x) = 0. We extend parameter valuations to linear terms. We write v, w <sup>|</sup>= (x<sup>i</sup> <sup>−</sup> <sup>x</sup><sup>j</sup> plt) iff <sup>w</sup>(xi) <sup>−</sup> <sup>w</sup>(x<sup>j</sup> ) <sup>v</sup>(plt), and v, w <sup>|</sup><sup>=</sup> <sup>Z</sup> iff v, w <sup>|</sup><sup>=</sup> <sup>e</sup> for all inequalities <sup>e</sup> in <sup>Z</sup>.

Given a parameter valuation <sup>v</sup>, we write <sup>v</sup>(A) for the timed automaton obtained by replacing all parameters p in invariants and guards by v(p). The concrete semantics of a PTA <sup>A</sup> is derived from the TA <sup>v</sup>(A), and defined as a timed transition system with states (, w), initial state (0, **0**) (we assume that **<sup>0</sup>** <sup>|</sup><sup>=</sup> <sup>I</sup>(0)), and transitions <sup>→</sup> <sup>=</sup> <sup>d</sup> → · <sup>e</sup> <sup>→</sup>, where continuous time delay ( <sup>d</sup> →) and discrete transitions ( <sup>e</sup> →) are defined as

**–** If <sup>d</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> and <sup>w</sup> <sup>+</sup> <sup>d</sup> <sup>|</sup><sup>=</sup> <sup>I</sup>(), then (, w) <sup>d</sup> <sup>→</sup> (, w <sup>+</sup> <sup>d</sup>).

**–** If e = (, g, R, ) <sup>∈</sup> <sup>E</sup> and <sup>w</sup> <sup>|</sup><sup>=</sup> <sup>g</sup> and <sup>w</sup>[R] <sup>|</sup><sup>=</sup> <sup>I</sup>( ) then (, w) <sup>e</sup> <sup>→</sup> ( , w[R]). An infinite run (0, w0) <sup>→</sup> (1, w1) → ··· is accepted if it passes through an accepting location infinitely often, i.e. the set {<sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>F</sup>} is infinite. We ignore the problem of Zeno runs, which can be avoided by a syntactic transformation [9].

Example 1. The PTA <sup>A</sup><sup>1</sup> in Fig. 1 has locations {0, 1}, clocks {x, y} and parameter p. Only <sup>1</sup> is accepting. The initial location <sup>0</sup> has an invariant consisting of two inequalities. Its self-loop is enabled if <sup>x</sup> <sup>≥</sup> 1 and it resets clock <sup>x</sup>. Note that clock y is never reset. For p = 2.5, we have the following example run:

 0,(0, 0) <sup>1</sup> → (0,(1, 1) <sup>→</sup> (0,(0, 1) <sup>1</sup> → (0,(1, 2) <sup>→</sup> (1,(1, 2) .

Note that the accepting location <sup>1</sup> would not be reachable for p < 2. On the other hand, for all <sup>p</sup> <sup>≥</sup> 2, there exists an infinite accepted run through 1.

We will now recall from [5,20] the parametric zone graph (PZG), providing an abstract semantics to a PTA. A single PZG treats all parameter valuations symbolically. Also, the PZG avoids the uncountably infinite timed transition system. The PZG can still be (countably) infinite.

We first define some operations on zones, in terms of their valuations. It is well known that convex polyhedra are closed under these operations, and our implementation in IMITATOR uses the Parma Polyhedra Library [10].


The PZG is a transition system where each abstract state consists of a location and a non-empty zone. The PZG of <sup>A</sup> = (L, 0, F, I, E) is (S, s0, <sup>⇒</sup>, A), with <sup>S</sup> <sup>⊆</sup> <sup>L</sup> × C, initial state <sup>s</sup><sup>0</sup> = (0,( - <sup>x</sup>∈<sup>X</sup> <sup>x</sup> = 0) <sup>∩</sup> <sup>I</sup>(0)), and accepting states <sup>A</sup> <sup>=</sup> {(, Z) <sup>|</sup> <sup>∈</sup> <sup>F</sup>}. A transition step (, Z) <sup>⇒</sup> ( , Z ) exists if for some (, g, R, ) <sup>∈</sup> <sup>E</sup> we have <sup>Z</sup> = ((<sup>Z</sup> <sup>∩</sup> <sup>g</sup>)[R] <sup>∩</sup> <sup>I</sup>( )) <sup>∩</sup> <sup>I</sup>( ) <sup>=</sup> <sup>∅</sup>. We write <sup>⇒</sup><sup>+</sup> (⇒<sup>∗</sup>) for the transitive (reflexive) closure of ⇒.

Example 2. The PZG of A<sup>1</sup> from Ex. 1 is shown in Fig. 2; it extends infinitely to the right. We use that (<sup>x</sup> = 0 <sup>∧</sup> <sup>y</sup> = 0) = (<sup>y</sup> <sup>−</sup> <sup>x</sup> = 0). The loop on <sup>0</sup> can only be executed when x = 1, and it resets x := 0, while y is never reset. So after <sup>n</sup> executions of the loop, <sup>y</sup> <sup>−</sup> <sup>x</sup> <sup>=</sup> <sup>n</sup>. These <sup>n</sup> steps are only possible if <sup>p</sup> <sup>≥</sup> <sup>n</sup>.

The PZG obeys two important properties (Prop. 1 and 2). First, the parametric constraint can only decrease along the transitions in the PZG. Second, a state simulates the behaviour of any state that it subsumes. We first define these notions. We write <sup>Z</sup> <sup>⊆</sup> <sup>Z</sup> iff v, w <sup>|</sup><sup>=</sup> <sup>Z</sup> implies v, w <sup>|</sup><sup>=</sup> <sup>Z</sup> .


**Proposition 1 ([25]).** If <sup>s</sup><sup>1</sup> <sup>⇒</sup> <sup>s</sup><sup>2</sup> then <sup>s</sup>2↓<sup>P</sup> <sup>⊆</sup> <sup>s</sup>1↓<sup>P</sup> .

**Proposition 2 ([25]).** If <sup>s</sup><sup>1</sup> <sup>⇒</sup> <sup>s</sup><sup>2</sup> and <sup>s</sup><sup>1</sup> s <sup>1</sup> then for some s <sup>2</sup>, s <sup>1</sup> <sup>⇒</sup> <sup>s</sup> <sup>2</sup> and <sup>s</sup><sup>2</sup> s 2.

Example 3. The first <sup>1</sup> state in Fig. 2 shows that there is an infinite loop when <sup>p</sup> <sup>≥</sup> 2. By Prop. 1, the parametric zone of all states following the dashed red edge are contained in <sup>p</sup> <sup>≥</sup> 2. So we can prune the PZG at the dashed red arrow, since no new parameter valuations will be found.

**Fig. 2.** PZG of the PTA A<sup>1</sup> from Fig. 1

(a) PTA A<sup>2</sup> (b) Its PZG with an infinite accepted run, but no loop

**Fig. 3.** PTA A<sup>2</sup> with the corresponding PZG

Example 4. Fig. 3 shows PTA A<sup>2</sup> and its infinite PZG. The transition can only become enabled when <sup>p</sup> <sup>≥</sup> 5. Each transition must happen within the following <sup>p</sup> time units, so after n > 0 iterations, 5 <sup>≤</sup> <sup>x</sup> <sup>−</sup> <sup>y</sup> <sup>≤</sup> <sup>n</sup> <sup>×</sup> <sup>p</sup>. Note that <sup>s</sup><sup>1</sup> <sup>⇒</sup> <sup>s</sup><sup>2</sup> and <sup>s</sup><sup>1</sup> s2. By Prop. 2, for some s , <sup>s</sup><sup>2</sup> <sup>⇒</sup> <sup>s</sup> and <sup>s</sup><sup>2</sup> s . Repeating the argument, we can construct an infinite trace. So, although the PZG has no cycle, the presence of an infinite path can be deduced even if we prune the PZG at the dashed edge.

### **3 Sound and Complete Liveness Parameter Synthesis**

Given a PTA <sup>A</sup>, we aim at synthesising the parameter valuations <sup>v</sup> for which the TA <sup>v</sup>(A) contains an infinite accepted run. Our algorithms operate by searching the PZG (S, s0, <sup>⇒</sup>, A) for accepting "lassos" or, as in Ex. 4, <sup>6</sup> and 7, even for accepting "spirals". We write <sup>⇒</sup><sup>+</sup> (⇒<sup>∗</sup>) for the transitive (reflexive) closure of <sup>⇒</sup>. An accepting lasso on <sup>s</sup><sup>1</sup> consists of two finite paths <sup>s</sup><sup>0</sup> <sup>⇒</sup><sup>∗</sup> <sup>s</sup><sup>1</sup> <sup>⇒</sup><sup>+</sup> <sup>s</sup>1, such that <sup>s</sup><sup>1</sup> <sup>∈</sup> <sup>A</sup>. More generally, an accepting spiral on <sup>s</sup><sup>1</sup> consists of two finite paths <sup>s</sup><sup>0</sup> <sup>⇒</sup><sup>∗</sup> <sup>s</sup><sup>1</sup> <sup>⇒</sup><sup>+</sup> <sup>s</sup>2, with <sup>s</sup><sup>1</sup> <sup>∈</sup> <sup>A</sup> and <sup>s</sup><sup>1</sup> s2.

**Proposition 3.** If the PZG of PTA <sup>A</sup> contains an accepting spiral on <sup>s</sup>1, then for all <sup>v</sup> <sup>∈</sup> <sup>s</sup>1↓<sup>P</sup> , <sup>v</sup>(A) contains an (infinite) accepted run.

Proof. Assume <sup>s</sup><sup>0</sup> <sup>⇒</sup><sup>∗</sup> <sup>s</sup><sup>1</sup> <sup>⇒</sup><sup>+</sup> <sup>s</sup><sup>2</sup> with <sup>s</sup><sup>1</sup> <sup>∈</sup> <sup>A</sup> and <sup>s</sup><sup>1</sup> <sup>s</sup>2. Note that <sup>s</sup><sup>2</sup> <sup>∈</sup> <sup>A</sup>, since only holds between states with the same location. Then by monotonicity, <sup>s</sup><sup>1</sup> <sup>↓</sup><sup>P</sup> <sup>s</sup><sup>2</sup> <sup>↓</sup><sup>P</sup> and by Prop. 1, <sup>s</sup><sup>2</sup> <sup>↓</sup><sup>P</sup> <sup>s</sup><sup>1</sup> <sup>↓</sup><sup>P</sup> , so <sup>s</sup><sup>1</sup> <sup>↓</sup><sup>P</sup> <sup>=</sup> <sup>s</sup><sup>2</sup> <sup>↓</sup><sup>P</sup> . By Prop. 2, there exists some <sup>s</sup><sup>3</sup> such that <sup>s</sup><sup>2</sup> <sup>⇒</sup> <sup>s</sup><sup>3</sup> and <sup>s</sup><sup>2</sup> s3. We can repeat this to construct an infinite accepted run from <sup>s</sup>1, with the constant parametric constraint <sup>s</sup>1↓<sup>P</sup> . The states from <sup>s</sup><sup>0</sup> <sup>⇒</sup><sup>∗</sup> <sup>s</sup><sup>1</sup> have an even larger constraint (Prop. 1). By the correspondence between runs in the PTA and runs in the PZG, we obtain an infinite accepted run in <sup>v</sup>(A) for every <sup>v</sup> <sup>s</sup>1↓<sup>P</sup> .

The reverse of Prop. 3 is not true. An infinite PZG could contain an infinite path that does not form a lasso (or even a spiral). Such an infinite path in the PZG may or may not correspond to a concrete TA run.

Example 5. The situation of A<sup>3</sup> in Fig. 4 is quite different from Ex. 4. The PZG of <sup>A</sup><sup>3</sup> has an infinite path (0, Zi), where <sup>Z</sup><sup>i</sup> contains the invariant <sup>x</sup> <sup>≤</sup> <sup>1</sup> <sup>∧</sup> <sup>y</sup> <sup>≤</sup> <sup>p</sup> and the additional constraints <sup>y</sup> <sup>−</sup><sup>x</sup> <sup>=</sup> <sup>i</sup>∧<sup>p</sup> <sup>≥</sup> <sup>i</sup>. Note that at most <sup>p</sup> transitions can happen in A3, since we cannot wait longer when <sup>y</sup> <sup>≥</sup> <sup>p</sup>. So <sup>v</sup>(A3) has only finite runs for any <sup>v</sup>. We call this infinite path infeasible, since <sup>∩</sup>i(Zi↓<sup>P</sup> ) = <sup>∅</sup>.

**Fig. 4.** PTA A3.

#### **3.1 Soundness and Completeness**

In contrast to TA, where both reachability and liveness properties are decidable [1], it is well-known that even reachability-emptiness for PTA is undecidable [2,3]. So in particular, we cannot expect a terminating, sound and complete algorithm for liveness synthesis. Instead, our algorithms are semi-algorithms, which enumerate a number of aggregate solutions, but may not terminate. Each aggregate solution will be presented as a convex polyhedral constraint on the parameters ("parametric zone").

Such semi-algorithms can either enumerate a finite number of aggregate solutions (after which they could terminate or diverge), or enumerate an infinite number of aggregates (and hence never terminate). Fig. 5 shows an example where the set of solutions, <sup>p</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>,...}, is not equivalent to a finite disjunction of convex polyhedra, so no terminating algorithm can enumerate all aggregate solutions.<sup>1</sup>

**Fig. 5.** PTA A<sup>4</sup>

In the rest of this section, we introduce and discuss various soundness and completeness requirements for semi-algorithms. Assume that the algorithm is run on an input PTA <sup>A</sup> and let Sol be the set of all solutions, i.e. Sol <sup>=</sup> {<sup>v</sup> <sup>|</sup> <sup>v</sup>(A) has an accepted run}. Assume that the algorithm enumerates a finite or infinite collection of aggregate solutions, in the form of parametric zones Zi.

Partial correctness: This traditional correctness criterion requires that if the algorithm terminates, then <sup>i</sup> <sup>Z</sup><sup>i</sup> <sup>=</sup> Sol, i.e. the finite output characterizes exactly all correct parameter valuations.

Soundness: This criterion also provides some guarantee when the algorithm diverges. It requires that all enumerated solutions are correct, i.e. <sup>i</sup> <sup>Z</sup><sup>i</sup> <sup>⊆</sup> Sol.

Completeness: We call a semi-algorithm complete if it enumerates all solutions, i.e. Sol ⊆ <sup>i</sup> <sup>Z</sup>i. Enumerating <sup>p</sup> = 1, <sup>p</sup> = 2, . . . is complete for <sup>A</sup>4.

Note that for reachability, a simple Breadth-First Search (BFS) over the PZG would yield a sound and complete (but not always terminating) semi-algorithm. For liveness, this is insufficient: the algorithm would miss infinite paths that do not form a cycle. Still, the following trivial semi-algorithm, EnumQ, would be sound and complete: "Enumerate all rational parameter valuations v, decide if <sup>v</sup>(A) has an accepting loop [1] and, if so, emit {v}." Although it is sound and complete, this algorithm is quite unsatisfactory, since it will never terminate, and it will never aggregate solutions in larger polyhedra. To distinguish PZG-based algorithms, we need a weaker form of completeness.

Completeness for symbolic lassos: A semi-algorithm is complete for symbolic lassos if it enumerates all parameter valuations leading to accepting lassos in the PZG, i.e. <sup>i</sup> <sup>Z</sup><sup>i</sup> contains <sup>s</sup>↓<sup>P</sup> , when the PZG contains an accepting lasso on <sup>s</sup>.

Completeness for symbolic lassos is weaker than completeness, since it may miss parameter valuations <sup>v</sup> for which <sup>v</sup>(A) has an accepted run, but this only happens when the PZG has an infinite path that does not end in a cycle.

<sup>1</sup>It is not even obvious that <sup>∩</sup>i(Zi↓<sup>P</sup> ) can be represented by a finite conjunction.

### **4 Semi-Algorithms for Liveness Parameter Synthesis**

In this section, we discuss three semi-algorithms for liveness parameter synthesis. In Sec. 4.1, we discuss the previous approach [11,25], based on Nested Depth-First Search (NDFS). All NDFS-based variants turn out to be incomplete for symbolic lassos. In Sec. 4.2, we introduce a simple algorithm based on Breadth-First Search (BFS), which analyses the Strongly Connected Components (SCC) at each new level. We show that the BFS-based algorithm is complete for symbolic lassos. Finally, Sec. 4.3 introduces our new Bounded Synthesis with Iterative Deepening (BSID) algorithm. BSID is also complete for symbolic lassos, and it is compatible with all NDFS enhancements.

#### **4.1 Nested Depth-First Search with Enhancements**

The NDFS algorithm (Alg. 1) is run on the PZG, with initial state s0, accepting states <sup>A</sup>, and next-state(s) enumerating the <sup>⇒</sup>-successors. We first explain basic NDFS [13], cf. the uncoloured parts of Alg. 1. The goal of the outer blue search (ll.4–13) is to visit all states in DFS order, and just before backtracking, call the red search on all accepting states (l.12). Note that states on the DFS stack are cyan (l.6), and states that are handled completely are blue (l.13). The goal of the inner red search (ll.14–21) is to detect if there is an accepting cycle. It colours visited states red (l.16), to ensure that states are visited at most once. It reports an accepting cycle (l.20) when a cyan state is encountered.

Cumulative pruning (pink) [11,25]. For synthesis, we collect the Constraints that lead to accepting cycles (l.20). We prune the search when the parametric constraint of some state is included in Constraints (l.5,15). This is justified by Prop. 1, since all successors of the pruned state will have an even smaller parametric constraint. Prop. 1 also implies that all states on a cycle have the same parametric constraint. So we also prune the red search, by restricting the search for a cycle to the current parametric constraint (l.18).

Subsumption (grey) [22,25]. This pruning strategy takes advantage of the subsumption relation between states. The accepting lassos reachable from red states s are already included in Constraints. By Prop. 3, any lasso on state t t can be simulated by t . Hence, we immediately prune the search when we encounter a state <sup>t</sup> Red, i.e. <sup>∃</sup><sup>t</sup> . t t ∈ Red (l.11,21). We exploit the subsumption structure once more: if <sup>t</sup> . Cyan, i.e. <sup>∃</sup><sup>t</sup> . t . <sup>t</sup> ∈ Cyan (l.19), we have found an accepting spiral, which implies there is an accepted run, Prop. 3.

Lookahead (yellow) . The lookahead strategy is new (in this context) and allows for early detection of accepting cycles in dfsBlue . It looks for a transition to a cyan state (l.7), which is on the DFS stack. If the source or target of this transition is accepting, then the cycle is accepting as well and reported at l.8.

Accepting First(blue). This is a new strategy, aimed at increasing the chance of finding an accepting cycle early in the search, to promote more pruning. It simply works by picking accepting successors before their siblings at l.10,17.

```
Alg. 1 Collecting ndfs with strategies:
cumulative pruning subsumption lookahead accepting first
1: procedure NDFS
2: Cyan := Blue := Red := ∅ ; Constraints := ∅
3: dfsBlue (s0)
4: procedure dfsBlue (s)
5: if s↓P ⊆ Constraints then Blue := Blue ∪ {s} ; return
6: Cyan := Cyan ∪ {s}
7: if ∃s ∈ next-state(s) ∩ Cyan : (s ∈ A ∨ s ∈ A) then
8: Constraints := Constraints ∪ {s
                                   ↓P } 
                                                 Report cycle at state s'
9: else
10: for all t ∈ Reordered-next-state(s) do
11: if t ∈ Blue ∪ Cyan ∧ t  Red then dfsBlue (t)
12: if s ∈ A then dfsRed (s)
13: Blue := Blue ∪ {s}; Cyan := Cyan \ {s}
14: procedure dfsRed (s)
15: if s↓P ⊆ Constraints then
16: Red := Red ∪ {s}
17: for all t ∈ Reordered-next-state(s) do
18: if t↓P = s↓P then
19: if Cyan  t then
20: Constraints := Constraints ∪ t↓P 
                                                   Report cycle at state t
21: else if t  Red then dfsRed (t)
```
Layering (not shown here) [25]. The layering strategy gives priority to states with large parametric constraints, since these potentially prune many other states. To this end, successors in the next parametric layer are delayed, which is sound, since every cycle must lie entirely in the same parametric layer (Prop. 1).

**Proposition 4.** All mentioned NDFS variants are sound and partially correct.

Proof. Partial correctness is shown in [25]. Soundness follows from Prop. 3, since all collected constraints correspond to accepting spirals.

Example 6. None of the mentioned NDFS is complete for symbolic lassos. Consider A<sup>5</sup> in Fig. 6. Its PZG extends Fig. 3(b) with a transition from all states to one additional accepting state with self-loop, <sup>s</sup> = (1, p <sup>+</sup> <sup>x</sup> <sup>≥</sup> <sup>y</sup> <sup>≥</sup> 6 + <sup>x</sup>), where <sup>s</sup>↓<sup>P</sup> = (<sup>p</sup> <sup>≥</sup> 6). All NDFS variants (including all combinations of cumulative pruning, subsumption, lookahead, accept-first, and layering) allow the execution that diverges on the infinite <sup>p</sup> <sup>≥</sup> 5 path, so they will never detect the accepting cycle on <sup>p</sup> <sup>≥</sup> 6.

**Fig. 6.** PTA A<sup>5</sup>

#### **4.2 Breadth-First Search**

We now describe a BFS-based synthesis algorithm for accepting cycle detection. As in Alg. 1, our BFS algorithm maintains a parameter constraint Constraints, initially empty. The algorithm basically explores the newly computed symbolic states in a breadth-first search manner, i.e. by iteratively computing all siblings at a given depth level, before computing their own children states. Then, whenever one of these new states is identical to a state already present in the state space, a cycle may exist. In this case, we run an SCC-detection algorithm (inspired by Tarjan) and, if there is indeed a cycle, we add the cycle parameter constraint to the result Constraints. Remember that, from Prop. 1, all states in such a cycle have the same parametric constraint.

Note that, in contrast to the algorithms in Sec. 4.1 and 4.3, we have to use state equality, since using unrestricted subsumption could introduce spurious cycles (cf. examples in [22]). However, we do use cumulative pruning, as in Sec. 4.1: whenever the parametric constraint of a new state s is included in the current result Constraints (i.e. <sup>s</sup>↓<sup>P</sup> <sup>⊆</sup> Constraints), we discard it, as no potential loop starting from this state, or from its successors, can improve Constraints anyhow.

In contrast to the NDFS-based algorithms in Sec. 4.1, our BFS algorithm is complete for symbolic lassos, since every lasso will appear at some level, and the SCC algorithm will eventually detect it.

**Proposition 5.** The BFS+SCC algorithm is sound, partially correct, and complete for symbolic lassos.

#### **4.3 Bounded Synthesis with Iterative Deepening**

One way to enforce termination is to explore the PZG up to a given depth (Bounded Synthesis). However, this could make the result incomplete. Therefore, as long as there are unexplored states, the bound should be increased (Iterative Deepening), to synthesize parameter valuations for deeper accepting cycles.

Alg. 2 presents this procedure, called BSID. Although all strategies in Sec. 4.1 are compatible with this approach, only cumulative pruning and subsumption are shown in the algorithm. It repeatedly explores the PZG from an initial depth depthinit, incrementing the depth by depthstep at each iteration (l.8). The termination criterion is that the current exploration did terminate without reaching its current depth (l.7). In this case, the result is complete. Both dfsBlue and dfsRed do not go beyond the current exploration depth (at l.10,20).

To avoid some duplicate work at different iterations, the set of blue states is split using two colours: Green states have a descendent not completely processed due to the depth limit, and should thus be considered in further iterations; Blue states are those whose children have already been completely explored and thus should not be considered anymore. Hence, at the beginning of an iteration, all colours but blue are reset (l.5). States are coloured green when they are at the depth limit (l.10) or if they have a green successor (l.16). Note that dfsBlue is not called for blue states at l.14, but it may be called for states that have been coloured green at the previous iteration but have been uncoloured.

**Proposition 6.** The BSID algorithm is sound, partially correct, and complete for symbolic lassos.

Proof. Soundness follows from Prop. 3, since every collected constraint corresponds to an accepting spiral. Completeness for symbolic lassos follows, since every accepting cycle in the PZG is entirely present at some depth. When NDFS is run beyond that depth, it will report the constraint leading to that cycle. Partial correctness follows, since the algorithm only terminates if the last run did not reach the depth-bound, in which case the PZG is searched exhaustively.

Example 7. On both A<sup>2</sup> (Fig. 3, Ex. 4) and A<sup>5</sup> (Fig. 6, Ex. 6), BSID will correctly report <sup>p</sup> <sup>≥</sup> 5 and then terminate; for <sup>A</sup><sup>5</sup> it may first report <sup>p</sup> <sup>≥</sup> 6, depending on the search order. It is actually the combination of bounded synthesis and subsumption that makes the algorithm complete for this example. The bound ensures that NDFS is run after the first iteration, and subsumption ensures that an accepting spiral is found as explained in Ex. 4. At this point, the constraint <sup>p</sup> <sup>≥</sup> 5 is discovered, which prunes the rest of the PZG, ensuring termination.

```
Alg. 2 Iterative deepening ndfs with cumulative constraint pruning and subsumption
1: procedure IterativeCollectNDFSsub(depthinit,depthstep)
2: Cyan := Blue := Red := Green := ∅ ; Constraints := ∅
3: depth := depthinit; again := true
4: while again do
5: Cyan := Red := Green := ∅ ; depthreached := false
6: dfsBlue (s0, 0)
7: if ¬depthreached then again := false
8: if again then depth := depth + depthstep
9: procedure dfsBlue (s,ds )
10: if ds ≥ depth then depthreached := true ; Green := Green ∪ {s} ; return
11: if s↓P ⊆ Constraints then Blue := Blue ∪ {s} ; return
12: Cyan := Cyan ∪ {s}
13: for all t ∈ next-state(s) do
14: if t ∈ Blue ∪ Green ∪ Cyan ∧ t  Red then dfsBlue (t,ds+1 )
15: if s ∈ A then dfsRed (s,ds )
16: if ∃s ∈ Green ∩ next-state(s) then Green := Green ∪ {s}
17: else Blue := Blue ∪ {s}
18: Cyan := Cyan \ {s}
19: procedure dfsRed (s,ds )
20: if ds < depth ∧ s↓P ⊆ Constraints then
21: Red := Red ∪ {s}
22: for all t ∈ next-state(s) do
23: if t↓P = s↓P then
24: if Cyan  t then
25: Constraints := Constraints ∪ t↓P 
                                                     Report cycle at state t
26: else if t  Red then dfsRed (t,ds+1 )
```
### **5 Experimental Evaluation**

We conducted some experiments, to compare all algorithms on the number of cases they can solve and on their efficiency. In order to compare cases in which an algorithm does not terminate, we also counted the number of reported cycles.

To this end, we implemented our new algorithms BFS and BSID in IMI-TATOR 3, <sup>2</sup> and we also reimplemented all NDFS-based algorithms [11,25] in a unified DFS framework. We ran all algorithms on a benchmark, distributed with IMITATOR [4] and also used in [25]. The size of the benchmarks is shown in Tab. 1 (columns L,X,P). We used a timeout of 120 s.<sup>3</sup>

In Tab. 1, we compare some combinations of NDFS enhancements (Sec. 4.1), extending the baseline (cumulative pruning). The results show that subsumption alone performs worst, while lookahead solves more cases, e.g. ll.3–6 of Tab. 1. Interestingly, adding our new accepting first strategy succeeds to find cycles (l.12) that are missed by all other strategies. Finally, adding the layering approach leads to success in most cases and provides the fastest results on average, but it finds no accepting cycles at all for five cases where others found some.

Tab. 2 compares the new algorithms BFS (Sec. 4.2) and BSID (Sec. 4.3), including all enhancements (except layering) under various depth settings. BSID is generally faster than BFS, in particular with an iterative depth-step of 5. The performance of BFS is closest to BSID with depth-step 1. The first two columns evaluate the effectiveness of using the green colour (ng = -no-green). Without green, no information from previous iterations is reused. Avoiding recomputation is faster, leading to a deeper exploration within the time limit (e.g. on l.2).

Comparing both tables, we notice that for ll.15–17 NDFS synthesised some parameter values that are missed by BSID and BFS. BSID is generally faster than its NDFS counterpart A+L+Sub, but NDFS with layering is even faster.

### **6 Case Study: the Bounded Retransmission Protocol**

The Bounded Retransmission Protocol (BRP) has been analysed in [16,14,19], but we now synthesise the most liberal parameter constraints to obtain some reachability and liveness guarantees. For reachability, these constraints are more liberal than proposed in previous work. Synthesising parameter constraints for liveness properties is new, and our new algorithms were required to achieve this.

Our starting point is the PTA model from [14]. Each session starts with a transmission request S in and is terminated by an indication S ok, S nok or S dk ("don't know"). The BRP is regulated by clocks, with some timing parameters: TD is the delay in the communication channel, TS and TR indicate the time that the sender (receiver) should wait. Finally, SYNC models the waiting time in case sender and receiver get out of sync. The maximum number of retransmissions is a discrete parameter, which we fixed in most experiments to MAX = 2.

<sup>2</sup>Algorithms are integrated in IMITATOR v.3. The artifact is at doi.org/10.5281/ zenodo.4115919 and can be run at: imitator.lipn.univ-paris13.fr/artifact.

<sup>3</sup>The experiment ran on a DELL PowerEdge FC640, 2 processors (Intel Xeon Silver 4114 @ 2.20 GHz), Debian GNU/Linux 10, 187.50 GiB memory.


**Table 1.** Comparing various NDFS enhancements. For each model, L denotes the number of locations, X the number of clocks, and P the number of parameters. For each algorithm, column d indicates the actual depth reached, m the minimum depth at which a cycle was found, c the total number of cycles found, s the number of states explored, and t the time spent in the algorithm (discarding parsing the model) in seconds. # terminations indicates the number of benchmarks for which the algorithm terminates, and # fastest how many times it performed best. Finally, we computed for each algorithm the Average Normalised Time over all benchmarks, where we normalised the time w.r.t. the largest time used by any algorithm in Tab. 1 and 2. Timeout values get a normalised time of 1.


### **6.1 Synthesis for Reachability Properties: deriving sharper bounds**

To illustrate synthesis for reachability properties, we first enhance the parametric verification experiments from [14,19] in IMITATOR. The reachability properties are: **(C)** the channels will never be used simultaneously; and **(R)** the receiver gets a correct initial frame in each session. Property **(C)** is formalised as:

property := #synth AGnot(loc[channelK] = in transitK & loc[channelL] = in transitL)

We synthesise the safe parameter constraints for "unreachability" by:<sup>4</sup>

imitator -mergeq -comparison inclusion brp Channels.imi brp Channels.imiprop

IMITATOR derives within 2 s the exact constraint TS > 2\*TD: The sender should wait (TS) for the round-trip time of a message + acknowledgement (2\*TD).

Property **(R)** is formalised by adding an error location FailureR to the receiver, which should be unreachable. Since we learned the constraint TS>2\*TD in the previous run, we now include this constraint in the initial condition. Within 1 s, IMITATOR synthesizes the exact constraint for this safety property:

imitator -mergeq -comparison inclusion brp RC.imi brp RC.imiprop SYNC + TS >= TR + TD & TS > 2∗TD & TR > 4∗TS + 3∗TD

The fact that this can be computed is not surprising, but it is surprising that this constraint is more liberal than the one derived in [14,19], which was:

SYNC >= TR & TS > 2∗TD & TR > 2∗MAX∗TS + 3∗TD

One can easily check that, for MAX = 2, their constraint is strictly stronger than ours. So we found more parameter values for which BRP is correct. By construction, we found the most liberal constraint for MAX = 2, and we confirmed a similar result for up to MAX = 20. We cannot handle a parametric MAX.

### **6.2 Liveness: approximations by bounded synthesis**

Next, we want to measure the overhead of liveness checking. To this end, we make the failureR location an accepting cycle, and use a liveness property. Note that in this case, the synthesised constraint will indicate the error condition.

```
accepting loc FailureR: invariant True when True goto FailureR;
init := ... & TS > 2 ∗ TD
property := #synth CycleThrough(accepting)
```
Since we search for an accepting loop, inclusion and merging are unsound, but still complete. However, we can safely apply subsumption in NDFS. Without inclusion, the zone graph is infinite, so we are forced to resort to bounded synthesis, which only provides an under-approximation. Hence, we also use iterative deepening (BSID, Sec. 4.3). The depth limit is reached in 6 s.

<sup>4</sup>Inclusion and merging are sound and complete for reachability [7]. Inclusion applies maximal subsumption, while merging combines zones with exact convex hull.

```
imitator brp RC.imi accepting.imiprop -depth-step=5 -depth-limit=25 -recompute-green
   4∗TS + 3∗TD >= TR & TS > 2∗TD
OR TR + TD > SYNC + TS & TS > 2∗TD
```
We could have searched even deeper for more liberal constraints, but it can be easily checked that this error constraint is equivalent to the complement of the safety constraint (within the initial condition), see Sec. 6.1, property **(R)**. Hence, we can conclude that we have already synthesised the exact constraint.

#### **6.3 Proper Liveness Properties**

**GF**(**S in**). Next, we will synthesise constraints for an actual liveness property, stating that the number of new sessions is infinite. We use Spot [15] to generate a B¨uchi automaton for the negation of this formula, and add the result as a monitor to the IMITATOR model, synchronising with the sender process. We add the constraints on correctness that we learned before to the initial constraints:

init := ... & SYNC >= TR & TS > 2∗TD & TR > 4∗TS + 3∗TD

The following command tries to synthesize all parameters (within the initial constraint) for which an accepting loop is reachable, i.e. **GF** S in is violated. We replaced subsumption by full inclusion, since otherwise IMITATOR gets lost in the infinite parametric zone graph. Recall that inclusion is complete but unsound for NDFS, so this provides an over-approximation of the constraints.

imitator -no-subsumption -comparison inclusion brp GF S in RC.imi accepting.imiprop

IMITATOR replies False in 1 second, so there is no reachable accepting cycle. Since this was an over-approximation, the result is conclusive: **GF** S in holds under all parameter values inside this initial constraint. Note that, in principle, the property could be violated outside this initial condition. We can rerun the same experiment with the more general initial condition TS > 2\*TD. IMITATOR confirms that the property still holds, but checking this larger space takes 19 s.

**G**(**S in** ⇒ **F**(**S ok** ∨ **S nok** ∨ **S dk**)). Using the same method, IMITATOR confirms in 16 s, that also this response property holds: every sessions start is followed by some indication.

imitator -no-subsumption -comparison inclusion brp GSinFSdk.imi accepting.imiprop

**<sup>G</sup>**(**<sup>S</sup> in** <sup>⇒</sup> **<sup>F</sup>**(**<sup>S</sup> ok** <sup>∨</sup> **<sup>S</sup> nok**)). Let us pretend that we forgot the indication <sup>S</sup> dk (don't know). This time, we search for a symbolic counter-example (using the option -witness), under the initial condition TS > 2\*TD.

property := #witness CycleThrough(accepting) imitator brp GSinFSnok.imi accepting one.imiprop

As expected, IMITATOR finds a counter-example quickly (within 0.04s).

### **7 Conclusion**

We presented and evaluated new semi-algorithms solving the liveness parameter synthesis problem for Parametric Timed Automata. We also introduced new soundness and completeness notions for such semi-algorithms. The new algorithms, based on BFS and Bounded Synthesis (BSID), at least enumerate all parameters leading to accepting lassos in the parametric zone graph. We showed that this property does not hold for all previous algorithms, which were based on NDFS. Our new algorithms are less sensitive to the particular search order than the previous NDFS algorithms, that could get stuck in some branch of the PZG.

Tab. 3 (left) shows the soundness and completeness status of all considered algorithms. Full inclusion and BS-n can only provide an over-approximation (resp. under-approximation). The enumQ algorithm is complete, but never terminates (indicated by ××), so its partial soundness and completeness results are vacuous (indicated by (-)). Although the problem is undecidable, one might still hope for an algorithm that enumerates all possible solutions (like enumQ, generating and testing all rational solutions) and produces a finite set of aggregate solutions (if it exists). The algorithm should terminate for practical cases.

Tab. 3 (right) shows the results of our algorithms for examples A1–A6. They either terminate with an exact (-) or partial ((-)) result, or diverge (×). In one case the addition of the layers strategy is needed to obtain a partial result ((L)).

Our last example shows another challenge to obtain a complete approach. The PZG of PTA A<sup>6</sup> has a non-cyclic infinite path. It seems non-trivial to compute its limit constraint automatically. After n steps, the parametric constraint is <sup>p</sup> <sup>≥</sup> <sup>n</sup> <sup>×</sup> <sup>q</sup>. So the limit constraint is <sup>q</sup> = 0∧<sup>p</sup> <sup>≥</sup> <sup>q</sup>.

In order to handle cases where the set of solutions is not even a finite union of convex sets (Fig. 5), an entirely different representation of the solutions would be required.

x≤q y≤p <sup>0</sup> <sup>x</sup>≥<sup>q</sup> <sup>x</sup>:=0

**Fig. 7.** PTA A6.

Finally, exploiting the component-based structure of networks of PTA using a compositional approach, such as the one developed recently for fair paths in infinite systems [12], would be an exciting extension.


**Table 3.** Soundness and completeness properties of various algorithms.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Algebraic Quantitative Semantics for Efficient Online Temporal Monitoring**

Konstantinos Mamouras() , Agnishom Chattopadhyay, and Zhifu Wang

> Rice University, Houston TX 77005, USA {mamouras, agnishom, zfwang}@rice.edu

**Abstract.** We investigate efficient algorithms for the online monitoring of properties written in metric temporal logic (MTL). We employ an abstract algebraic semantics based on semirings. It encompasses the Boolean semantics and a quantitative semantics capturing the robustness of satisfaction, which is based on the max-min semiring over the extended real numbers. We provide a precise equational characterization of the class of semirings for which our semantics can be viewed as an approximation to an alternative semantics that quantifies the distance of a system trace from the set of all traces that satisfy the desired property.

**Keywords:** Online Monitoring · Verification · Quantitative Semantics.

### **1 Introduction**

Online monitoring is a lightweight verification technique for checking during runtime that a system behaves as desired. It has proved to be effective for evaluating the correctness of the behavior of complex systems, which includes cyber-physical systems (CPSs) that consist of both computational and physical processes. An online monitor is a program that observes the execution trace of the system and emits values that indicate events of interest or other actionable information.

It is common to specify monitors using special-purpose formalisms such as variants of temporal logic and domain-specific programming languages. In the context of cyber-physical systems, logics that are interpreted over signals are frequently used. This includes Metric Temporal Logic (MTL) [30] and Signal Temporal Logic (STL) [33]. We focus here on properties specified with MTL and interpreted over discrete-time signals. We do not restrict the outputs of the monitor to Boolean (qualitative) verdicts, but allow for a quantitative interpretation of property satisfaction that admits various degrees of truth or falsity. Such quantitative interpretations of temporal logic have been considered before, including several variants of the so-called robust semantics of MTL [22,20,5].

Our starting point is the widely-used spatial robust semantics of MTL [22]. This uses the set R±<sup>∞</sup> = R ∪ {−∞, ∞} of the extended real numbers as truth values, where a positive number indicates truth, a negative number indicates falsity, and zero is ambiguous. Disjunction is interpreted as max, and conjunction is interpreted as min. Two quantitative semantic notions are considered in [22]:

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 330–348, 2021.

https://doi.org/10.1007/978-3-030-72016-2\_18

(1) the robustness degree degree(ϕ, u) of a trace u w.r.t. a formula ϕ, which is defined in a global way using distances between signals, and (2) the robust semantics ρ(ϕ, u) of a formula ϕ w.r.t. a trace u, which is defined by induction on the structure of ϕ. The former notion is the primary definition that captures the intuitive idea of the degree of satisfaction, whereas the latter is used as an approximate estimate. The usefulness of this estimate is justified by establishing a precise relationship between the two values [22]. The robust semantics of [22] has been used in prior work on online monitoring [16,15].

We embark on an investigation of how to generalize the robustness framework of [22] to other notions of quantitative truth values. Instead of focusing exclusively on the concrete structure (R±<sup>∞</sup>,sup, inf, −∞, ∞), we take an abstract algebraic approach and look at classes of structures that are defined axiomatically. We start by considering the class of semirings, algebraic structures of the form (V, +, ·, 0, 1) with an addition operation + (which models disjunction) and a multiplication operation · (which models conjunction) satisfying a set of equational laws. The class of semirings contains B = {⊥, } (the Boolean values), (R±<sup>∞</sup>, max, min, −∞, ∞), the max-plus (tropical) semiring (R∪{−∞}, max, +, −∞, 0), and (R, +, ·, 0, 1). The semiring of intervals with (semiring) addition given by [a, b] ⊕ [c, d] = [max(a, c), max(b, d)] and (semiring) multiplication given by [a, b] ⊗ [c, d] = [min(a, c), min(b, d)] is an especially interesting example, as it can be used to model uncertainty in the truth value: an element [a, b] indicates that the truth value lies somewhere within this interval.

We use an algebraic generalization of the inductively-defined robust semantics of [22], as our goal is to obtain online monitors that are time- and spaceefficient. Our main results are the following:


We provide an implementation of our algebraic monitoring framework in Rust. Our experiments show that our monitors scale reasonably well and they compare favorably against the state-of-the-art monitoring tool Reelay [40].

### **2 Algebraic Semantics using Semirings**

A semiring is an algebraic structure (V, +, ·, 0, 1), where + is called addition and · is called multiplication, that satisfies the following properties: (1) (V, +, 0) is a commutative monoid, (2) (V, ·, 1) is a monoid, (3) multiplication distributes over addition, and (4) 0 is an annihilator for multiplication. The last two properties say that x(y+z) = xy+xz, (x+y)z = xz+yz, and 0x = x0 = 0 for all x, y, z ∈ V . We sometimes write xy to mean x·y. A semiring V is called idempotent if addition is idempotent, that is, x + x = x for every x ∈ V . For an idempotent semiring, we define the partial order induced by + as follows: x ≤ y iff x + y = y. A homomorphism from a semiring U to a semiring V is a function h : U → V that commutes with the semiring operations. An epimorphism is a surjective homomorphism. Let U and V be idempotent semirings and h : U → V be a semiring homomorphism. Then, h is monotone (i.e., order-preserving).

**Example 1.** The set B = {⊥, } of Boolean values with disjunction and conjuction is a semiring. The set T = {⊥, ?, } can be endowed with semiring structure as follows: x + ⊥ = x, x + = , ? + ? = ?, x · ⊥ = ⊥, x · = x, and ? · ? = ?, where · is commutative. The structure T is used to give a three-valued interpretation of formulas (? is inconclusive). The structure (R±<sup>∞</sup>, max, min, −∞, ∞) is the max-min semiring over the extended reals. The structure (R, +, ·, 0, 1) is a semiring and Z (integers) and N (natural numbers) are subsemirings of it.

We interpret the max-min semiring R±<sup>∞</sup> as degrees of truth, where positive means true and negative means false. The value 0 is ambiguous. For this reason we also consider a variant of R±<sup>∞</sup>, where the value 0 is refined into a positive +0 (true) and a negative −0 (false). We thus obtain the max-min semiring R±<sup>∞</sup> <sup>±</sup><sup>0</sup> , which is isomorphic to B × R≥<sup>0</sup>, where R≥<sup>0</sup> = {x ∈ R | x ≥ 0}.

For integers i, j ∈ Z we define the intervals [i, j] = {n ∈ Z | i ≤ n ≤ j} and [i, ∞) = {n ∈ Z | i ≤ n}. For a set I of integers and n ∈ Z, define n + I = {n + i | i ∈ I} and n − I = {n − i | i ∈ I}.

For a semiring V , an interval I = [i, j] (where i, j are natural numbers) and an I-indexed tuple ¯x = (xi)<sup>i</sup>∈<sup>I</sup> whose components are in V , we define x¯ = - <sup>k</sup>∈<sup>I</sup>x<sup>k</sup> <sup>=</sup> j <sup>k</sup>=<sup>i</sup>x<sup>k</sup> = x<sup>i</sup> + ··· + x<sup>j</sup> and x¯ = <sup>k</sup>∈<sup>I</sup>x<sup>k</sup> <sup>=</sup> <sup>j</sup> <sup>k</sup>=<sup>i</sup>x<sup>k</sup> = x<sup>i</sup> ··· x<sup>j</sup> . If the tuple ¯x is empty (i.e., I = ∅) then we define x¯ = 0 and x¯ = 1.

We will consider formulas of Metric Temporal Logic (MTL) interpreted over traces that are finite or infinite sequences of data items from a set D. We write D∗ (resp., D<sup>+</sup>) for the set of all finite (resp., non-empty finite) sequences over <sup>D</sup>, and <sup>D</sup><sup>ω</sup> <sup>=</sup> <sup>ω</sup> <sup>→</sup> <sup>D</sup> for the set of all infinite sequences over <sup>D</sup>, where <sup>ω</sup> is the first infinite ordinal (i.e., the set of natural numbers). We also define <sup>D</sup><sup>∞</sup> <sup>=</sup> <sup>D</sup><sup>∗</sup> <sup>∪</sup> <sup>D</sup><sup>ω</sup>. We write <sup>ε</sup> for the empty sequence and <sup>|</sup>u<sup>|</sup> for the length of a trace, where |u| = ω if u is infinite. A finite sequence u ∈ D∗ can be viewed as a function from {0,..., |u| − 1} to D, that is, u = u(0)u(1)...u(|u| − 1). We also consider a semiring V whose elements represent quantitative truth values, and unary quantitative predicates p : D → V . We write 1, 0 : D → V for the predicates given by 1(d) = 1 and 0(d) = 0 for every d ∈ D.

The set MTL(D, V ) of *temporal formulas* is built from the atomic predicates p : D → V using the Boolean connectives ∨ and ∧, the unary temporal connectives P<sup>I</sup> , H<sup>I</sup> , F<sup>I</sup> , G<sup>I</sup> , and the binary temporal connectives S<sup>I</sup> , S¯<sup>I</sup> , U<sup>I</sup> , U¯ <sup>I</sup> , where I is an interval of the form [i, j] or [i, ∞) with i, j < ω. For every temporal

$$\begin{aligned} \rho(p, u, i) &= p(u(i)) \\ \rho(\varphi \vee \psi, u, i) &= \rho(\varphi, u, i) + \rho(\psi, u, i) & \rho(\varphi \vee \psi, u, i) &= \rho(\varphi, u, i) \cdot \rho(\psi, u, i) \\ \rho(\mathsf{P}\_{I}\varphi, u, i) &= \sum\_{j \in i - I, \, j \ge 0} \rho(\varphi, u, j) & \rho(\mathsf{H}\_{I}\varphi, u, i) &= \prod\_{j \in i - I, \, j \ge 0} \rho(\varphi, u, j) \\ \rho(\mathsf{F}\_{I}\varphi, u, i) &= \sum\_{j \in i + I, \, j < \vert u \vert} \rho(\varphi, u, j) & \rho(\mathsf{G}\_{I}\varphi, u, i) &= \prod\_{j \in i + I, \, j < \vert u \vert} \rho(\varphi, u, j) \\ \rho(\varphi \amalg\_{I}\psi, u, i) &= \sum\_{j \in i - I, \, j \ge 0} \rho(\rho(\psi, u, j) \cdot \prod\_{k=j + 1}^{i} \rho(\varphi, u, k)) \\ \rho(\varphi \amalg\_{I}\psi, u, i) &= \prod\_{j \in i - I, \, j \ge 0} \rho(\rho(\psi, u, j) + \sum\_{k=j+1}^{i} \rho(\varphi, u, k)) \\ \rho(\varphi \amalg\_{I}\psi, u, i) &= \sum\_{j \in i + I, \, j < \vert u \vert} \left(\prod\_{k=i}^{j-1} \rho(\varphi, u, k)\right) \cdot \rho(\psi, u, j) \\ \rho(\varphi \amalg\_{I}\psi, u, i) &= \prod\_{j \in i + I, \, j < \vert u \vert} \left(\sum\_{k=i}^{j-1} \rho(\varphi, u, k) + \rho(\psi, u, j)\right) \end{aligned}$$

Fig. 1: Semiring-based quantitative semantics for MTL.

connective <sup>X</sup> ∈ {P, <sup>H</sup>, <sup>S</sup>, <sup>S</sup>¯, <sup>F</sup>, <sup>G</sup>,U,U¯}, we write <sup>X</sup><sup>i</sup> as an abbreviation for <sup>X</sup>[i,i] and X as an abbreviation for X[0,∞).

Since we focus in this paper on online monitoring, we restrict attention to the *future-bounded* fragment of MTL, where the future-time temporal connectives are bounded. That is, every U<sup>I</sup> connective is of the form U[a,b] for a ≤ b<ω (and similarly for F<sup>I</sup> , G<sup>I</sup> , U¯ <sup>I</sup> ). We always assume this restriction on formulas.

We interpret the formulas in MTL(D, V ) over traces from D<sup>∞</sup> and at specific time points. The interpretation function ρ : MTL(D, V ) × D<sup>∞</sup> × ω → V , where ρ(ϕ, u, i) is defined when i < |u|, is shown in Fig. 1. We say that the formulas ϕ and ψ are equivalent, and we write ϕ ≡ ψ, if ρ(ϕ, u, i) = ρ(ψ, u, i) for every u ∈ D<sup>∞</sup> and i < |u|. For every formula ϕ and every interval I, it holds that <sup>P</sup>I<sup>ϕ</sup> <sup>≡</sup> <sup>1</sup> <sup>S</sup><sup>I</sup> <sup>ϕ</sup>, <sup>H</sup>I<sup>ϕ</sup> <sup>≡</sup> <sup>0</sup> <sup>S</sup>¯<sup>I</sup> <sup>ϕ</sup>, <sup>F</sup>I<sup>ϕ</sup> <sup>≡</sup> <sup>1</sup> <sup>U</sup><sup>I</sup> <sup>ϕ</sup>, and <sup>G</sup>I<sup>ϕ</sup> <sup>≡</sup> <sup>0</sup> <sup>U</sup>¯ <sup>I</sup> <sup>ϕ</sup>.

We say that a semiring V refines B if there is a semiring homomorphism h : V → B. Notice that h is necessarily an epimorphism because h(0) = ⊥ and <sup>h</sup>(1) = . Informally, we think of <sup>h</sup>−<sup>1</sup>(⊥) as the subset of "false" values and <sup>h</sup>−<sup>1</sup>() as the subset of "true" values. In particular, this means that <sup>V</sup> can be partitioned into true and false values. There are semirings that cannot refine B. For example, the semiring (Z, +, ·, 0, 1) of the integers cannot refine B.

Let h : V → B. For a predicate p : D → V , we say that d ∈ D h-satisfies p, and we write d |=<sup>h</sup> p, if h(p(d)) = . For u ∈ D<sup>∞</sup> and i < |u| we define the satisfaction relation |=<sup>h</sup> as usual (for atomic formulas: u, i |=<sup>h</sup> p iff u(i) |=<sup>h</sup> p).

**Lemma 2.** Let D be a set of data items, V be a semiring, and h : V → B. The following are equivalent:

(1) The function h is a semiring homomorphism.

(2) u, i |=<sup>h</sup> ϕ iff h(ρ(ϕ, u, i)) = for every ϕ : MTL(D, V ), u ∈ D<sup>∞</sup> and i < |u|.

Lemma 2 says that the qualitative semantics |=<sup>h</sup> agrees with the quantitative semantics ρ exactly when h : V → B is a semiring homomorphism. In this case, ρ is more fine-grained and loses no information regarding Boolean satisfaction.

**Lemma 3.** Let D be a set of data items and V be a semiring. The identities of Fig. 2 hold for all formulas ϕ, ψ ∈ MTL(D, V ).

P<sup>ϕ</sup> <sup>≡</sup> P1(Pϕ) <sup>∨</sup> <sup>ϕ</sup> and H<sup>ϕ</sup> <sup>≡</sup> H1(Hϕ) <sup>∧</sup> <sup>ϕ</sup> <sup>ϕ</sup> <sup>S</sup> <sup>ψ</sup> <sup>≡</sup> (P1(<sup>ϕ</sup> S <sup>ψ</sup>) <sup>∧</sup> <sup>ϕ</sup>) <sup>∨</sup> <sup>ψ</sup> <sup>P</sup>[a,∞)<sup>ϕ</sup> <sup>≡</sup> PaP<sup>ϕ</sup> and H[a,∞)<sup>ϕ</sup> <sup>≡</sup> HaH<sup>ϕ</sup> <sup>ϕ</sup> <sup>S</sup>[a,∞) <sup>ψ</sup> <sup>≡</sup> <sup>P</sup>a(<sup>ϕ</sup> <sup>S</sup> <sup>ψ</sup>) <sup>∧</sup> <sup>H</sup>[0,a−1]ϕ, for <sup>a</sup> <sup>≥</sup> <sup>1</sup> <sup>P</sup>[a,b]<sup>ϕ</sup> <sup>≡</sup> PaP[0,b−a]<sup>ϕ</sup> and H[a,b]<sup>ϕ</sup> <sup>≡</sup> HaH[0,b−a]<sup>ϕ</sup> <sup>ϕ</sup> <sup>S</sup>[a,b] <sup>ψ</sup> <sup>≡</sup> <sup>P</sup>a(<sup>ϕ</sup> <sup>S</sup>[0,b−a] <sup>ψ</sup>) <sup>∧</sup> <sup>H</sup>[0,a−1]ϕ, for <sup>a</sup> <sup>≥</sup> <sup>1</sup> <sup>F</sup>[0,b]<sup>ϕ</sup> <sup>≡</sup> FbP[0,b]<sup>ϕ</sup> and G[0,b]<sup>ϕ</sup> <sup>=</sup> GbH[0,b]<sup>ϕ</sup> <sup>F</sup>[a,b]<sup>ϕ</sup> <sup>≡</sup> FbP[0,b−a]<sup>ϕ</sup> and G[a,b]<sup>ϕ</sup> <sup>≡</sup> GbH[0,b−a]<sup>ϕ</sup> <sup>ϕ</sup> <sup>U</sup>[a,b] <sup>ψ</sup> <sup>≡</sup> <sup>G</sup>[0,a−1]<sup>ϕ</sup> <sup>∧</sup> <sup>F</sup>a(<sup>ϕ</sup> <sup>U</sup>[0,b−a] <sup>ψ</sup>), for <sup>a</sup> <sup>≥</sup> <sup>1</sup>

The identities of Fig. 2 are all shown using the semiring axioms. The identity below can be used to reduce the monitoring of S[0,a] to P[0,a].

$$\varphi \, \mathfrak{S}\_{[0,a]} \, \psi \equiv (\varphi \, \mathfrak{S} \, \psi) \wedge \mathbb{P}\_{[0,a]} \psi \tag{1}$$

An early occurrence of this idea is in [19], where they consider the more general (future-time) form ϕ U[a,b] ψ ≡ (ϕ U[a,∞) ψ) ∧ F[a,b]ψ. Prior work on efficient monitoring [15] uses an algorithm based on it. Specifically, [15] uses a sliding-max algorithm [32], which can be applied to the max-min semiring R±<sup>∞</sup> and other similar linear orders, but is not applicable to partial orders or other semirings.

**Proposition 4.** For a set D with at least two elements and a semiring V , the following are equivalent:


Proposition 4 gives a precise characterization of when the identity (1) applies. This characterization is axiomatic and identifies the class of bounded distributive lattices as the most general class for which the identity is valid. One important implication is that monitors that are based on this identity cannot be used for other semirings such as (R, +, ·, 0, 1) and (N, +, ·, 0, 1).

**Example 5 (Uncertainty).** We want to identify a notion of quantitative truth values in situations where we interpret formulas over a signal x[n] that is not known with perfect accuracy, but we can put an upper and lower bound on each sample, i.e., a ≤ x[n] ≤ b. For example, suppose that we know that 99.9 ≤ x[0] ≤ 100.1 and we want to evaluate the atomic predicate p = "x ≥ 99" at time 0. The truth value can be taken to be the interval [0.9, 1.1] in this case, since there is uncertainty in the distance of signal value from the threshold.

More concretely, this situation of uncertain input signal can arise in the monitoring of systems where the raw signal is captured at one site, then compressed and transmitted to another site for monitoring. In many resource-constrained settings (e.g., certain IoT systems), the signal has to be compressed with a lossy compression scheme in order to meet network bandwidth constraints. So, at the monitoring site, the exact signal values are not known but can possibly be placed within intervals (depending on the used compression scheme).

In order to model this kind of uncertainty, we consider the set I(R±∞) of intervals of the form [a, b] with a ≤ b and a, b ∈ R±∞. An interval [a, b] ⊆ R±<sup>∞</sup> can be thought of as an uncertain truth value (it can be any one of those contained in [a, b]). For intervals [a, b] and [c, d] we define [a, b]⊕[c, d] = [max(a, c), max(b, d)] and [a, b] ⊗ [c, d] = [min(a, c), min(b, d)]. An interval of the form [a, a] is equal to the singleton set {a}. The structure (I(R±∞), ⊕, ⊗, {−∞}, {∞}) is a semiring.

The semiring I(R±∞) is a partial order (more specifically, it is a bounded distributive lattice) and therefore does not fit existing monitoring frameworks that consider only linear orders (e.g., the max-min semiring R±<sup>∞</sup> of the extended reals and the associated sliding-max/min algorithms).

### **3 Symbolic Quantitative Traces and Languages**

In this section we start with our investigation of how to generalize the "robustness degree" of [22] to our abstract algebraic setting. The result of [22] that relates the robustness degree with the robust semantics is an inequality. For this reason, we focus on idempotent semirings, for which there is a natural partial order ≤ that is induced by semiring addition (x ≤ y iff x + y = y). Since our approach is abstract algebraic (i.e., axiomatic), we have no notion of real-valued distance between elements of D. Moreover, V does not need to be a semiring of real numbers. Instead, we rely on the intuition that for an atomic predicate p : D → V and a data item d ∈ D, the value p(d) gives a degree of truth or falsity. We propose using symbolic traces **x** = p0p<sup>1</sup> ...p<sup>n</sup>−<sup>1</sup>, which are sequences of atomic predicates, in order to compactly represent sets of concrete traces, which are sequences of data items. If each p<sup>i</sup> represents a subset S<sup>i</sup> ⊆ D, then **x** represents the set L = S<sup>0</sup> × S<sup>1</sup> ×···× S<sup>n</sup>−<sup>1</sup> = {v0v<sup>1</sup> ...v<sup>n</sup>−<sup>1</sup> | v<sup>i</sup> ∈ S<sup>i</sup> for each i} of concrete traces. Moreover, given a concrete trace <sup>u</sup> <sup>=</sup> <sup>u</sup>0u<sup>1</sup> ...u<sup>n</sup>−<sup>1</sup> <sup>∈</sup> <sup>D</sup><sup>n</sup>, we can use the value p0(u0) · p1(u1)··· p<sup>n</sup>−<sup>1</sup>(u<sup>n</sup>−<sup>1</sup>) ∈ V as a quantitative measure of how close the trace u is to the set of traces L. We propose the interpretation of a formula ϕ as a language of symbolic traces. This will allow us to define the "closeness" of a trace <sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>n</sup> to the specification <sup>ϕ</sup> as a (semiring) sum of all the closeness values w.r.t. each symbolic trace in the symbolic language of ϕ. We will also see that this interpretation of a formula ϕ as a symbolic language is compatible with the standard interpretation of ϕ as a set of concrete traces. Using these definitions we obtain a generalization of the theorem of [22] that relates the robustness degree with the robust semantics. Additionally, we characterize precisely the class of semirings for which this generalization is possible.

Let V be an idempotent semiring. For predicates p, q : D → V we define p ≤ q if p(d) ≤ q(d) for every d ∈ D. The intuition for p ≤ q is that p is a stronger predicate than q. We write F(D, V ) to denote the set of atomic quantitative predicates, which always includes the predicates 1 and 0. For symbolic traces **x**, **y** ∈ F(D, V )<sup>∞</sup> with λ = |**x**| = |**y**| we define **x** ≤ **y** if **x**(i) ≤ **y**(i) for every i<λ. These relations ≤ on predicates and traces are partial orders. We define the symbolic satisfaction relation |=, where **x**, i |= ϕ says that the formula ϕ : MTL(D, V ) is satisfied by the symbolic trace **x** ∈ F(D, V )<sup>∞</sup> at position i < |**x**|. For atomic formulas, we put **x**, i |= p iff **x**(i) ≤ p. The definition is given by induction on ϕ in the usual way. For a formula ϕ : MTL(D, V ), length λ ∈ ω ∪ {ω} and a position i<λ, we define the symbolic language SL(ϕ, λ, i) = {**x** ∈ <sup>F</sup>(D, V )<sup>λ</sup> <sup>|</sup> **<sup>x</sup>**, i <sup>|</sup><sup>=</sup> <sup>ϕ</sup>}. For nonempty finite traces **<sup>x</sup>** <sup>∈</sup> <sup>F</sup>(D, V )<sup>n</sup> and <sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>n</sup> of the same length, we define **<sup>x</sup>**[u] = <sup>n</sup> <sup>i</sup>=1**x**(i)(u(i)), where n = |**x**| = |u|. Since the semiring multiplication is monotone w.r.t. ≤, we see that **x** ≤ **y** implies **<sup>x</sup>**[u] <sup>≤</sup> **<sup>y</sup>**[u] for every <sup>u</sup> <sup>∈</sup> <sup>D</sup>n. Informally, the value **<sup>x</sup>**[u] quantifies how close the concrete trace u is to the symbolic trace **x**.

**Example 6.** Let D = R and V = R±<sup>∞</sup>. For c ∈ R, the predicate p = "x ≥ c" is defined by p(d) = d−c for every d ∈ D. The predicate q = "x ≤ c" is given by q(d) = c−d for every d ∈ D. For the symbolic trace **x** = "x ≥ 1" "x ≤ 5" "x ≥ 2" and the concrete trace u = 3 6 8 we get that **x**[u] = min(2, −1, 6) = −1.

Let c, d ∈ R. For the predicates p = "x ≥ c" and q = "x ≥ d" we have that p ≤ q iff d ≤ c. Similarly, for the predicates p = "x ≤ c" and q = "x ≤ d" it holds that p ≤ q iff c ≤ d. Finally, notice that the predicates "x ≥ c" and "x ≤ d" are incomparable. Consider **y** = "x ≥ 0" "x ≤ 7" "x ≥ 1" and observe that **x** ≤ **y**.

For the formula ϕ = p ∧ F1q, where p and q are atomic predicates, we have that SL(ϕ, 2, 0) = {p <sup>q</sup> <sup>∈</sup> <sup>F</sup>(D, V )<sup>2</sup> <sup>|</sup> <sup>p</sup> <sup>≤</sup> <sup>p</sup> and <sup>q</sup> <sup>≤</sup> <sup>q</sup>}.

The definition of the robustness degree in [22] involves the value −dist(u, L) = <sup>−</sup> inf<sup>v</sup>∈<sup>L</sup> dist(u, v) = sup<sup>v</sup>∈<sup>L</sup> <sup>−</sup>dist(u, v), where <sup>u</sup> is a trace, <sup>L</sup> is a set of traces, and dist is a metric. Notice that this is a supremum over a potentially infinite set. The semirings that we have considered so far have an addition operation that can model a finitary supremum. In order to model an infinitary supremum, we need to consider semirings that have an infinitary addition operation. A complete semiring is an algebraic structure (V, +, -, ·, 0, 1), where - <sup>i</sup>∈<sup>I</sup>x<sup>i</sup> is the sum of the I-indexed tuple of elements (xi)<sup>i</sup>∈<sup>I</sup> , that satisfies: (1) - <sup>i</sup>∈∅ - x<sup>i</sup> = 0, <sup>i</sup>∈{j}x<sup>i</sup> <sup>=</sup> <sup>x</sup><sup>j</sup> , - <sup>i</sup>∈{j,k}x<sup>i</sup> <sup>=</sup> <sup>x</sup><sup>j</sup> <sup>+</sup> <sup>x</sup><sup>k</sup> for <sup>j</sup> <sup>=</sup> <sup>k</sup>, and - <sup>k</sup>∈<sup>K</sup> - <sup>i</sup>∈I<sup>k</sup> <sup>x</sup><sup>i</sup> <sup>=</sup> - <sup>i</sup>∈<sup>I</sup>x<sup>i</sup> where I = <sup>k</sup>∈<sup>K</sup>I<sup>k</sup> and the index sets (Ik)<sup>k</sup>∈<sup>K</sup> are pairwise disjoint, (2) (V, ·, 1) is a monoid, (3) the infinite distributivity properties (- <sup>i</sup>∈<sup>I</sup>xi) · <sup>y</sup> <sup>=</sup> - <sup>i</sup>∈<sup>I</sup> (xiy) and x · ( - <sup>i</sup>∈<sup>I</sup> <sup>y</sup>i) = - <sup>i</sup>∈<sup>I</sup> (xyi) hold for every index set <sup>I</sup> and all <sup>x</sup>i, y <sup>∈</sup> <sup>V</sup> , and (4) 0 is an annihilator for multiplication. A complete semiring V is idempotent if - <sup>i</sup>∈<sup>I</sup>x<sup>i</sup> <sup>=</sup> <sup>x</sup> for every non-empty index set <sup>I</sup> with <sup>x</sup><sup>i</sup> <sup>=</sup> <sup>x</sup> for every <sup>i</sup> <sup>∈</sup> <sup>I</sup>. For example, (R±<sup>∞</sup>, max,sup, min, −∞, +∞) is an idempotent complete semiring. For a formula <sup>ϕ</sup> : MTL(D, V ), a trace <sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>+</sup> and i<n <sup>=</sup> <sup>|</sup>u|, we define

$$\mathsf{val}(\varphi, u, i) = \sum\_{\mathbf{x} \in \mathsf{SL}(\varphi, n, i)} \mathbf{x}[u]. \tag{2}$$

Informally, val(ϕ, u, i) is a measure of how close the trace u is to satisfying ϕ at position i. It is an abstract algebraic variant of the robustness degree [22].

**Theorem 7 (Approximation).** Let D be a set of data items and V be an idempotent complete semiring. Then, the following are equivalent:

(1) The multiplication of V is idempotent and 1 is the top element of V .

(2) For every <sup>ϕ</sup> : MTL(D, V ), <sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>+</sup> and i < <sup>|</sup>u|, val(ϕ, u, i) <sup>≤</sup> <sup>ρ</sup>(ϕ, u, i).

Proof. Assume that (1) holds. Let n ≥ 1 be an integer. For a symbolic language L ⊆ <sup>F</sup>(D, V )<sup>n</sup> and for <sup>u</sup> <sup>∈</sup> <sup>D</sup>n, we define val(L, u) = - **<sup>x</sup>**∈L**x**(u). Let {Li}i∈<sup>I</sup> be a collection of languages with <sup>L</sup><sup>i</sup> <sup>⊆</sup> <sup>F</sup>(D, V )n. Then,

$$\text{val}(\bigcup\_{i \in I} \mathcal{L}\_i, u) = \sum\_{\mathbf{x} \in \bigcup\_{i \in I} \mathcal{L}\_i} \mathbf{x}[u] \le \sum\_{i \in I} \sum\_{\mathbf{x} \in \mathcal{L}\_i} \mathbf{x}[u] = \sum\_{i \in I} \text{val}(\mathcal{L}\_i, u). \tag{3}$$

For symbolic languages <sup>L</sup>1,L<sup>2</sup> <sup>⊆</sup> <sup>F</sup>(D, V )n, define <sup>L</sup> <sup>=</sup> <sup>L</sup><sup>1</sup> ∩ L2, <sup>L</sup> <sup>1</sup> = L<sup>1</sup> \ L<sup>2</sup> and L <sup>2</sup> = L<sup>2</sup> \ L1. Then, L<sup>1</sup> = L <sup>1</sup> ∪ L and L<sup>2</sup> = L <sup>2</sup> ∪ L. The languages L 1,L <sup>2</sup>,L are pairwise disjoint. So, we have that val(L1, u) = x + z and val(L2, u) = y + z, where x = val(L <sup>1</sup>, u), y = val(L <sup>2</sup>, u) and z = val(L, u). It follows that

$$\mathsf{val}(\mathcal{L}\_1 \cap \mathcal{L}\_2, u) = z = zz \le (x + z)(y + z) = \mathsf{val}(\mathcal{L}\_1, u) \cdot \mathsf{val}(\mathcal{L}\_2, u) \tag{4}$$

by the idempotence of multiplication. This property extends to val(L<sup>1</sup> ∩···∩ L<sup>k</sup>, u) ≤ val(L1, u)··· val(L<sup>k</sup>, u). Now, we will prove (2) by induction on ϕ.


The rest of the cases S, S¯, U, U¯ can be dealt with similarly using (3) and (4). The proof that (2) implies (1) is not too difficult, and we therefore omit it.

Theorem 7 could be considered an abstract algebraic counterpart of the result of [22] (page 4268, Theorem 13) for discrete finite traces. We will discuss later how it can be used to obtain the original result (for the max-min semiring R±∞) as a corollary. Additionally, Theorem 7 gives a precise equational characterization of the class of semirings for which the relationship between the two semantics holds.

Let D be a set of data items, V be a semiring and h : V → B. For a formula ϕ : MTL(D, V ), length λ ∈ ω ∪ {ω} and i<λ, we define the concrete trace language CLh(ϕ, λ, i) = {<sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>λ</sup> <sup>|</sup> u, i <sup>|</sup>=<sup>h</sup> <sup>ϕ</sup>}. For a symbolic trace **<sup>x</sup>** <sup>∈</sup> <sup>F</sup>(D, V )<sup>λ</sup>, we define its (concrete) trace language by CLh(**x**) = {<sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>λ</sup> <sup>|</sup> <sup>u</sup> <sup>|</sup>=<sup>h</sup> **<sup>x</sup>**}, where u |=<sup>h</sup> **x** means that u(i) |=<sup>h</sup> **x**(i) for every i<n. Lemma 8 below establishes a correspondence between the symbolic and concrete language of a formula ϕ, which we need to connect Theorem 7 to the concrete setting of [22].

**Lemma 8 (Concrete and Symbolic Languages).** Let D be a set of data items, V be an idempotent semiring with top element 1, and h : V → B be a semiring homomorphism. For every formula ϕ : MTL(D, V ), length λ ∈ ω ∪ {ω}, and position i<λ, it holds that CLh(ϕ, λ, i) = **<sup>x</sup>**∈SL(ϕ,λ,i)CLh(**x**).

### **4 Relationship with robust semantics**

In this section, we consider the concrete quantitative setting where V is the maxmin semiring R±∞. We will obtain the result of [22] that relates the robustness degree with the robust semantics as a consequence of Theorem 7.

A metric space is a set M together with a function dist : M×M → R≥<sup>0</sup>, called metric, satisfying: (1) dist(x, y) = 0 iff x = y for all x, y ∈ M, (2) dist(x, y) = dist(y, x) for all x, y ∈ M, and (3) dist(x, z) ≤ dist(x, y)+dist(y, z) for all x, y, z ∈ M. Given a metric dist on M we define the distance function Dist as follows:

$$\begin{aligned} \mathsf{dist}: &M \times \mathcal{P}(M) \to \mathbb{R}\_{\geq 0}^{\infty} \\ \mathsf{dist}(x, S) = \inf\_{y \in S} \mathsf{dist}(x, y) \\ \mathsf{dist}(x, \emptyset) = \infty \end{aligned} \qquad \begin{aligned} \mathsf{Dist}: &M \times \mathcal{P}(M) \to \mathbb{R}^{\pm \infty} \\ \mathsf{Dist}(d, S) = \begin{cases} -\mathsf{dist}(d, S), & \text{if } d \notin S \\ \mathsf{dist}(d, \sim S), & \text{if } d \in S \end{cases} \end{aligned}$$

where ∼S = M \ S is the complement of S. Notice that Dist(x, ∅) = −∞.

Let D be a metric space of points (data items). Let p be a propositional letter (symbol), and O(p) ⊆ D be its interpretation, that is, the set of points for which p is true. The corresponding quantitative predicate is p : D → R±<sup>∞</sup> given by p(d) = Dist(d, O(p)) for every d ∈ D. Given the metric dist on D, we obtain a metric dist : <sup>D</sup><sup>λ</sup> <sup>×</sup> <sup>D</sup><sup>λ</sup> <sup>→</sup> <sup>R</sup><sup>∞</sup> <sup>≥</sup><sup>0</sup> (on the set of traces of length <sup>λ</sup>, where λ ∈ ω ∪ {ω}) as follows: dist(u, v) = supi<λ dist(u(i), v(i)). Let CLO(ϕ, n, i) = {<sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>n</sup> <sup>|</sup> u, i <sup>|</sup>=<sup>O</sup> <sup>ϕ</sup>} be the set of traces (of length <sup>n</sup>) that satisfy <sup>ϕ</sup> at <sup>i</sup> (defined using the interpretation function O). Corollary 9 below was proved in [22]. We will give a proof that relies on the algebraic variant that we presented earlier.

**Corollary 9.** Let D be a set of data items, and V = R±<sup>∞</sup>. Let ϕ : MTL(D, V ), <sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>n</sup> and i<n (where <sup>n</sup> <sup>≥</sup> 1). Then, <sup>−</sup>dist(u, CLO(ϕ, n, i)) <sup>≤</sup> <sup>ρ</sup>(ϕ, u, i).

Proof. We will use the semiring R±<sup>∞</sup> <sup>±</sup><sup>0</sup> ∼= B × R<sup>∞</sup> <sup>≥</sup><sup>0</sup> instead of <sup>R</sup>±<sup>∞</sup>, so that the value 0 is not ambiguous (it can be either true or false when we use R±<sup>∞</sup>). That is, we will have a positive zero +0 (true) and a negative zero −0 (false). The semiring homomorphism h : R±<sup>∞</sup> <sup>±</sup><sup>0</sup> <sup>→</sup> <sup>B</sup> sends the positive (resp., negative) elements to (resp., ⊥). We will interpret a predicate symbol p as the quantitative predicate p : D → R±<sup>∞</sup> <sup>±</sup><sup>0</sup> given by <sup>p</sup>(d) = <sup>−</sup>dist(d, <sup>O</sup>(p)) if d /∈ O(p) and p(d)=+dist(d, ∼O(p)) if d ∈ O(p). Using these definitions, the satisfaction relations |=<sup>O</sup> and |=<sup>h</sup> are the same, hence CL<sup>O</sup> and CL<sup>h</sup> are the same. Now,

$$\begin{split} \text{dist}(u, \mathbb{C}\mathcal{L}\_{h}(\varphi, n, i)) &= \text{dist}(u, \bigcup\_{\mathbf{x} \in \mathbb{S}\mathcal{L}(\varphi, n, i)} \mathbb{C}\mathcal{L}\_{h}(\mathbf{x})) \qquad \text{[Lemma 8]} \\ &= \inf\_{\mathbf{x} \in \mathbb{S}\mathcal{L}(\varphi, n, i)} \inf\_{v \in \mathbb{C}\mathcal{L}\_{h}(\mathbf{x})} \sup\_{i < n} \text{dist}(u(i), v(i)) \qquad \text{[def. of dist]} \\ &\geq \inf\_{\mathbf{x} \in \mathbb{S}\mathcal{L}(\varphi, n, i)} \sup\_{i < n} \inf\_{v \in \mathbb{C}\mathcal{L}\_{h}(\mathbf{x})} \text{dist}(u(i), v(i)) \qquad \text{[sup inf} \leq \inf \text{sup]} \\ &= \inf\_{\mathbf{x} \in \mathbb{S}\mathcal{L}(\varphi, n, i)} \sup\_{i < n} \inf\_{v(i) \in \mathcal{O}(\mathbf{x}(i))} \text{dist}(u(i), v(i)) \quad \text{[def. of } \mathbf{C}\mathcal{L}] \\ &= \inf\_{\mathbf{x} \in \mathbb{S}\mathcal{L}(\varphi, n, i)} \sup\_{i < n} \text{dist}(u(i), \mathcal{O}(\mathbf{x}(i))). \qquad \text{[def. of } \mathbf{dist}] \end{split}$$

By negating the above inequality we get that

$$-\mathsf{dist}(u,\mathsf{CL}\_h(\varphi,n,i)) \le \sup\_{\mathbf{x}\in\mathsf{SL}(\varphi,n,i)} \inf\_{i$$

which is ≤ - **<sup>x</sup>**∈SL(ϕ,n,i)**x**[u] = val(ϕ, u, i). From Theorem 7 we get val(ϕ, u, i) <sup>≤</sup> ρ(ϕ, u, i) and therefore −dist(u, CLO(ϕ, n, i)) ≤ ρ(ϕ, u, i).

From Corollary 9 we can also obtain ρ(ϕ, u, i) ≤ dist(u, ∼CLO(ϕ, n, i)). This inequality is equivalent to −dist(u, ∼CLO(ϕ, n, i)) ≤ −ρ(ϕ, u, i), which in turn is equivalent to −dist(u, CLO(∼ϕ, n, i)) ≤ ρ(∼ϕ, u, i). The operation ∼ on formulas is a pseudo-negation, that is, ∼ϕ is the formula that results by "dualizing" all connectives and negating the atomic predicates. This operation is meaningful for the semiring R±<sup>∞</sup>. The final inequality is an instance of Corollary 9 for ∼ϕ.

Theorem 7 and Corollary 9 are not used later for the monitoring algorithm. The significance of our theorem is that it can be instantiated to give the existing result from [22]. This serves as a sanity check for our algebraic framework and it supports the semiring-based semantics of Sect. 2.

### **5 Online Monitoring**

For an infinite input trace <sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>ω</sup>, the output of the monitor for the time instant t should be ρ(ϕ, u, t), but the monitor has to compute it by observing only a finite prefix of u. In order for the output value of the monitor to agree with the standard temporal semantics over infinite traces we may need to delay an output item until some part of the future input is seen. For example, in the case of F1p we need to wait for one time unit: the output at time t is given after the input item at time t + 1 is seen. In other words, the monitor for F1p has a delay (the output is falling behind the input) of one time unit. Symmetrically, we can allow monitors to emit output early when the correct value is known. For example, the output value for P1p is 0 in the beginning and the value at time t is already known from time t − 1. So, we also allow monitors to have negative delay (the output is running ahead of the input). The function dl : MTL → Z gives the amount of delay required to monitor a formula. It is defined by dl(p) = 0 and

$$\begin{aligned} \mathsf{dl}(\varphi \wedge \psi) &= \max(\mathsf{dl}(\varphi), \mathsf{dl}(\psi)) & \mathsf{dl}(\varphi \ \mathsf{S}\_{[a,b]} \ \psi) &= \max(\mathsf{dl}(\varphi), \mathsf{dl}(\psi)) - a \\ \mathsf{dl}(\varphi \ \mathsf{S}\_{[a,\infty)} \ \psi) &= \max(\mathsf{dl}(\varphi), \mathsf{dl}(\psi)) - a & \mathsf{dl}(\varphi \ \mathsf{U}\_{[a,b]} \ \psi) &= \max(\mathsf{dl}(\varphi), \mathsf{dl}(\psi)) + b. \end{aligned}$$

The monitor TL(ϕ) for a formula ϕ is a variant of a Mealy machine. If dl(ϕ) = 0, the TL(ϕ) is precisely a Mealy machine (one output item per input item) with inputs D and outputs V . If = dl(ϕ) > 0, then TL(ϕ) emits no output for the first steps and then behaves like a Mealy machine. If = dl(ϕ) < 0, then TL(ϕ) emits items upon initialization and continues to behave like a Mealy machine.

Let A and B be sets. A monitor of type M(A, B) is a state machine G = (St, init, o, next, out), where St is a set of states, init ∈ St is the initial state, o ∈ B∗ is the initial output, next : St × A → St is the transition function, and out : St × A → Opt(B) is the output function, where Opt(B) = B ∪ {nil}.

```
map(op) : M(A, B)
                    St = Unit
                   init = u
                     o = ε
            next(s, a) = s
             out(s, a) = op(a)
                                         aggr(b, op) : M(A, B)
                                                   St = B
                                                 init = b
                                                   o = ε
                                           next(s, a) = op(s, a)
                                            out(s, a) = op(s, a)
                                                                        emit(n, v) : M(A, A)
                                                                                St = Unit
                                                                               init = u
                                                                                 o = vn
                                                                         next(s, a) = s
                                                                          out(s, a) = a
ignore(n) : M(A, A)
        St = [0, n]
       init = 0
         o = ε
next(s, a) = s + 1, if s<n
next(s, a) = s, if s = n
 out(s, a) = nil, if s<n
 out(s, a) = a, if s = n
                                wnd(n, v, op) : M(A, A)
                                           St = Buf(A)
                                          init = Buf(n, v)
                                            o = ε
                                   next(s, a) = s.ins(a)
                                    out(s, a) = s.ins(a).agg(op)
                                                                    wndV(n, op) : M(A, A)
                                                                              St = Buf(A)
                                                                             init = Buf()
                                                                               o = ε
                                                                      next(s, a) = s.ins(a)
                                                                       out(s, a) = ε, if size(s) < n − 1
                                                                       out(s, a) = s.ins(a).agg(op), o/w
```
Fig. 3: Basic building blocks for constructing temporal quantitative monitors.

In Fig. 3 we give several examples of simple monitors that can be used as building blocks. The monitor map(op) applies the function op : A → B elementwise. The monitor aggr(b, op) applies a running aggregation to the input trace that is specified by the initial aggregate b : B and the aggregation function op : B×A → B (similar to the fold combinator used in functional programming). The monitor emit(n, v) emits n copies of the value v ∈ A upon initialization and then echoes the input trace. The monitor ignore(n) discards the first n items of the trace and proceeds to echo the rest of the trace. The monitor wnd(n, v, op) performs an aggregation, given by the associative function op : A×A → A, over a sliding window of size n. It initializes the window using the value v : A and emits output at the arrival of every item. The monitor wndV(n, op) is different in that it starts with an empty window and it only starts emitting output when the window fills up with n items. We will combine monitors using the operations serial composition >> and parallel composition par. In the serial composition G >> H the output trace of G is propagated as input trace to H. In the parallel composition par(G, H) the input trace to copied to two concurrently executing monitors G and H and their output traces are combined. Both combinators >> and par are given by variants of the product construction on state machines. In the case of par the output traces of G and H may not be synchronized (one may be ahead of the other), which requires some bounded buffering in order to properly align them. The construction for par is described in [37]. Some variants of the combinators of Figure 3 are part of the StreamQL language [29], which has been proposed for the processing of streaming time series.

The identities of Fig. 2 suggest that MTL monitoring can be reduced to a small set of computational primitives. In fact, the primitives described earlier are sufficient to specify the monitors, as shown in Fig. 4. We write π<sup>1</sup> : A × B → A for the left projection and π<sup>2</sup> : A × B → B for the right projection.

Let <sup>u</sup> <sup>∈</sup> <sup>D</sup><sup>+</sup> and <sup>n</sup> <sup>=</sup> <sup>|</sup>u|. If n>a then <sup>ρ</sup>(<sup>ϕ</sup> <sup>S</sup>[0,a] ψ, u, n <sup>−</sup> 1) = <sup>ρ</sup>(<sup>ϕ</sup> <sup>S</sup> ψ, v, a), where v is the suffix of u with a + 1 items. If n ≤ a then ρ(ϕ S[0,a] ψ, u, n − 1) =

```
TL(p) = map(p)
    TL(ϕ ∨ ψ) = par(TL(ϕ), TL(ψ)) >> map(+)
       TL(Pϕ) = TL(ϕ) >> aggr(0, +)
      TL(Paϕ) = TL(ϕ) >> emit(a, 0)
  TL(P[a,∞)ϕ) = TL(PaPϕ)
     TL(ϕ S ψ) = par(TL(ϕ), TL(ψ)) >> aggr(0, opS)
          opS : V × (V × V ) → V , where
 opS(s, x, y)=(s · x) + y
TL(ϕ S[a,∞) ψ) = TL(Pa(ϕ S ψ) ∧ H[0,a−1]ϕ)
 TL(ϕ S[0,b] ψ) = par(TL(ϕ), TL(ψ)) >>
                 wnd(b + 1, 0, ⊗S) >> map(π2)
 TL(ϕ S[a,b] ψ) = TL(Pa(ϕ S[0,b−a] ψ) ∧ H[0,a−1]ϕ)
      TL(Faϕ) = TL(ϕ) >> ignore(a)
   TL(F[a,b]ϕ) = TL(FbP[0,b−a]ϕ)
 TL(ϕ U[0,b] ψ) = par(TL(ϕ), TL(ψ)) >>
                 wndV(b + 1, ⊗U) >> map(π2)
TL(ϕ U[a,b] ψ) = TL(G[0,a−1]ϕ ∧ Fa(ϕ U[0,b−a] ψ))
                                                       // fill buffer with v (initial values)
                                                       T[n] buf ← [n; v]
                                                       // calculate partial aggregates
                                                       for i ← n − 2 to 0 do
                                                           buf [i] ← buf [i] ⊗ buf [i + 1]
                                                       // initial total aggregate
                                                       T agg ← buf [0]
                                                       Nat m ← 0 // size of new block
                                                       T z ← nil // aggregate of new block
                                                       Function Add(T d):
                                                           if m = n then // full new block
                                                                // convert new block to old
                                                                for i ← n − 2 to 1 do
                                                                    buf [i] ← buf [i] ⊗ buf [i + 1]
                                                                m ← 0 // empty new block
                                                                z ← nil
                                                           // evict oldest item, replace with d
                                                           buf [m] ← d
                                                           m ← m + 1 // new block enlarged
                                                           z ← z ⊗ d // where nil ⊗ d = d
                                                           if m<n then
                                                                agg ← buf [m] ⊗ z
                                                           else // m = n agg ← z
```
Fig. 4: Online monitors for bounded-future MTL formulas & sliding aggregation.

ρ(ϕ S ψ, 0<sup>a</sup>+1−<sup>n</sup>u, a). So, we can implement a monitor for the connective S[0,a] by computing S over a window of exactly a + 1 data items.

**Proposition 10 (Aggregation for** S**,** U**).** Let V be a semiring. For every trace <sup>u</sup> <sup>=</sup> <sup>u</sup>0u<sup>1</sup> ...u<sup>n</sup>−<sup>1</sup> <sup>∈</sup> (<sup>V</sup> <sup>×</sup><sup>V</sup> )<sup>+</sup> of length <sup>n</sup> <sup>=</sup> <sup>|</sup>u|, the values <sup>ρ</sup>(π<sup>1</sup> <sup>S</sup>π2, u, n−1) and ρ(π<sup>1</sup> Uπ2, u, 0) can be written as aggregates of the form π2(u<sup>0</sup> ⊗u<sup>1</sup> ⊗···⊗u<sup>n</sup>−<sup>1</sup>).

Proposition 10 justifies the translation of S[0,b]/U[0,b] into monitors (Fig. 4). Now, we will describe the data structure that performs the sliding aggregation. It is used in Fig. 3 in the monitors wnd and wndV. The implementation is shown in Fig. 4. Suppose that the current window (of size n) is [x0, x1,...,x<sup>n</sup>−<sup>1</sup>]. We maintain a buffer of the form [x<sup>n</sup>−<sup>m</sup>,...,x<sup>n</sup>−<sup>1</sup>, y0,...,y<sup>n</sup>−1−<sup>m</sup>], where the part [x<sup>n</sup>−<sup>m</sup>,...,x<sup>n</sup>−<sup>1</sup>] is the block of newer elements ("new block") and the part [y0,...,y<sup>n</sup>−1−<sup>m</sup>] contains aggregates of the older elements ("old block"). They satisfy the invariant y<sup>i</sup> = x<sup>i</sup>⊗···⊗x<sup>n</sup>−1−<sup>m</sup> for every i = 0,...,n−1−m. We also maintain the aggregate z = x<sup>n</sup>−<sup>m</sup> ⊗···⊗ x<sup>n</sup>−<sup>1</sup> of the new block. So, the overall aggregate of the window is agg = y0⊗z. When a new item d arrives, we evict the aggregate y<sup>0</sup> corresponding to the oldest item x<sup>0</sup> and replace it by d. Thus, the new block is expanded with the additional item d and therefore we also update the aggregates z and agg. When the new block becomes full (i.e., m = n) then we convert it to an old block by performing all partial aggregations from right to left. This conversion requires n−1 applications of ⊗, but it is performed once every n items. So, the algorithm needs O(1) amortized time-per-item.

**Theorem 11.** Let D be a set of data items, V be a semiring, and ϕ : MTL(D, V ) be a bounded-future formula. The monitor TL(ϕ) : M(D, V ) is a streaming algorithm that needs O(2|ϕ<sup>|</sup> ) space and O(|ϕ|) amortized time-per-item.

Proof. The algorithm needs space that is exponential in the size of ϕ because of the connectives of the form X[a,∞) and X[a,b]. The monitor uses buffers of size a or b − a. Since the constants a, b are written in binary notation, we need space that is exponential in the size. The O(|ϕ|) amortized time per element hinges on the algorithm of Fig. 4, which is used for S[0,b] and U[0,b]. As discussed earlier, this algorithm needs O(1) amortized time-per-item.

### **6 Experimental Evaluation**

We have implemented our semiring-based monitoring framework in Rust. We compare our implementation with the verified lattice-based monitors of [13] and the monitoring tool Reelay [40]. We perform our experiments using the (R±<sup>∞</sup>, max, min) semiring for truth values, which are approximately represented using 64-bit floating-point numbers.

We have observed that all three tools process items at a roughly constant rate. We summarize the performance of a monitor with the average time it takes to process one data item (i.e., amortized time-per-item). In Fig. 5, we consider formulas X[0,n], Xn, X[n,2n], X[n,∞) where X ∈ {S, P}. We show the time-per-item for the monitors for n = 1, 10, 10<sup>2</sup>, 10<sup>3</sup>, 10<sup>4</sup>, 10<sup>5</sup>, 10<sup>6</sup>. We have also evaluated how the monitors for future temporal connectives scale with respect to the constants in the intervals. In Fig. 6, we benchmark all tools using formulas from the Timescales benchmark [39]. Our monitors are generally more than 100 (resp., 10) times faster than Reelay (resp., the lattice-based tool of [13]).

The profiling tools Valgrind [38] and Heaptrack [41] are used to analyze the memory consumption of the monitors. Our Rust implementation, given a formula, begins by allocating a fixed amount of memory and does not allocate any more memory during the rest of the computation. Reelay allocates and de-allocates memory throughout its execution. The lattice-based monitor is implemented in OCaml (which is a garbage-collected language) and consumes a larger amount of memory. In Fig. 5, we plot the peak memory usage of the monitors. We note that our tool does not seem to be allocating an increasing amount of memory for P<sup>n</sup> and similar formulas. This is because the corresponding monitor for P<sup>n</sup> emits output as early as possible and therefore does not need to use a buffer. In the case of the lattice-based monitor and our tool, we observe that the memory consumption does not depend on the input trace (it only depends on the formula). In the case of Reelay, it appears that the memory consumption depends on the input trace. We have plotted the behavior for two different input traces: one that consists of an increasing sequence of values ("reelay-ascending"), and another one that is decreasing ("reelay-descending"). We have only measured the memory usage of Reelay for up to n = 2<sup>13</sup>, as the execution becomes very slow beyond this value.

We use *case studies* from the automotive domain, which have been suggested as benchmarks for hybrid system verification [25]. The **A**utomatic Transmission System has two input signals (a throttle and a break) and three output signals: the gear sequence (g<sup>i</sup> for each gear i), the engine rotation speed (in

Fig. 5: Microbenchmark

rpm, denoted ω) and the vehicle speed (denoted v). Based on the suggestions in [25], we consider five properties: A<sup>1</sup> = ω < ω, A<sup>2</sup> = (ω < ω) ∧ (v < v), A<sup>3</sup> = g<sup>1</sup> ∧ Y(g2) → YH[0,2.5]g<sup>2</sup> (where Y is notation for P1), A<sup>4</sup> = H[0,5](ω < <sup>ω</sup>) <sup>→</sup> <sup>H</sup>[0,2.5](v < <sup>v</sup>) and <sup>A</sup><sup>5</sup> = (v > <sup>v</sup>) <sup>S</sup>¯[0,1] ((ω > <sup>ω</sup>) <sup>S</sup>¯[0,2] ((¬g4) <sup>S</sup>¯[0,10] ((¬g3) <sup>S</sup>¯ ((¬g2) <sup>S</sup>¯ (¬g1))))). All constants in the temporal connectives are in seconds, and we choose the constants v = 120 and ω = 4500. Formula A<sup>3</sup> says that before changing from the second to the first gear, at least 2.5 seconds must first pass. Formula A<sup>4</sup> says that keeping the engine speed low enough should ensure that the vehicle does not exceed a certain speed. Formula A<sup>5</sup> says that changing the gear from the first to the fourth within 10 seconds, and then having the engine speed exceed ω will cause the vehicle speed to exceed v. The other case study is a **F**ault-Tolerant Fuel Control System. We monitor two properties. The first is

Fig. 6: Macrobenchmarks

that the fuel flow rate should frequently become and remain non-zero for a sufficient amount of time. We encode this as F<sup>1</sup> = H[0,10]P[0,1](F uelFlowRate > 0). The other property is to ensure that whenever the air-to-fuel ratio goes out of bounds, then within 1 second it should settle back and stay there for a second. This is written as F<sup>2</sup> = (H[0,1]airF uelRatio < 1) S¯[0,2] airF uelRatio < 1. The experimental results are shown in Fig. 6.

All of our experiments were executed on a laptop with an Intel Core i7 10610U CPU clocked at 2.30GHz and 16GB of memory. Each value reported is the mean of 20 executions of the experiment. The whiskers in the plots indicate the standard deviation across all executions.

### **7 Related Work**

Fainekos and Pappas [22] define the robustness degree of satisfaction in terms of the distance of the signal from the set of desirable ones (or its complement). They also suggest an under-approximation of the robustness degree which can be effectively monitored. This is called the robust semantics and is defined by induction on STL formulas, by interpreting conjunction (resp., disjunction) as min (resp., max) of R±<sup>∞</sup>. Our paper explores this robust semantics (and the related approximation guarantee) in the general algebraic setting of semirings.

In [27], the authors study a generalization of the robustness degree by considering idempotent semirings of real numbers. They also propose an online monitoring algorithm that uses symbolic weighted automata. While this approach computes the precise robustness degree in the sense of [22], the construction of the relevant automata incurs a doubly exponential blowup if one considers STL specifications. In [13], it is observed that an extension of the robust semantics to bounded distributive lattices can be effectively monitored. In this paper, we generalize this semantics by considering semirings (bounded distributive lattices are semirings). Semirings are also used in [9], where the authors consider a spatio-temporal logic. They consider the class of constraint semirings, which require the semiring order to induce a complete lattice. Efforts have been made to define notions of robustness that take temporal discrepancies into account. In [20], we see a definition of temporal robustness by considering the effect of shifting the signal in time. The "edit distance" between discretized signals is proposed as a measure of robustness in [26]. Abbas et al. [3] define a notion of (τ,ε) closeness between signals, which considers temporal and value-based guarantees separately. In [2], a metric based on conformance is put forward for applications in cardiac electrophysiology. Averaging temporal operators are used in [5], which assign a higher value to temporal obligations that are satisfied earlier.

A key ingredient for the efficient monitoring of STL is a streaming algorithm for sliding-window maximum [19,15]. The tool Breach [17,18], which is used for the falsification of temporal specifications over hybrid systems, uses the sliding-maximum algorithm of [32]. In contrast, we use a more general sliding aggregation which applies to any associative operation (not only max/min) and does not require the truth values to be totally ordered.

Different approaches for interpreting future temporal connectives in the context of online monitoring have been studied. While [16] assumes the availability of a predictor to interpret future connectives, [21] considers robustness intervals: the tightest intervals which cover the robustness for all possible extensions of the available trace prefix. Reelay [40] exclusively uses past-time connectives. The transducer-based framework of [37] can be used to monitor rich temporal properties which depend on bounded future input by allowing some bounded delay in the output.

There is a large amount of work on formalisms, domain-specific languages and associated tools for quantitative online monitoring and, more generally, for data stream processing. The synchronous language LOLA [14] has served as the basis for the StreamLAB tool [23], which is used for monitoring cyber-physical systems. Quantitative Regular Expressions [36] and associated automata-theoretic models with registers [7,8,6] have been used to express complex online detection algorithms for medical monitoring [1,4]. There are many synchronous languages and models of computation based on Kahn's dataflow model [28] that have been used for signal processing [31] and embedded controller design [12,11,10]. The construction of online monitors described in Sect. 5 relies on a set of combinators that constitute a simple domain-specific language for stream processing. Our focus here, however, is on providing efficient monitors for MTL formulas with a quantitative semantics, rather than designing a general-purpose language for monitor specification. The compositional construction of automata-based monitors from temporal specifications has also been considered in [34,35,24].

### **8 Conclusion**

We have presented a new efficient algorithm for the online monitoring of MTL properties over discrete traces. We have used an abstract algebraic semantics based on semirings, which can be instantiated to the widely-used Boolean (qualitative) and robustness (quantitative) semantics, as well as to other partially ordered semirings. We also provide a theorem that relates our quantitative semantics with an algebraic generalization of the robustness degree of [22]. We have provided an implementation of our algebraic monitoring framework, and we have shown experimentally that our monitors scale reasonably well and are competitive against the state-of-the-art tool Reelay [40].

### **References**


ing of synchronous systems. In: TIME 2005. pp. 166–174. IEEE (2005). https://doi.org/10.1109/TIME.2005.26


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Neural Networks**

# **Synthesizing Context-free Grammars from Recurrent Neural Networks**

Daniel M. Yellin<sup>1</sup> and Gail Weiss<sup>2</sup>

<sup>1</sup> IBM, Givatayim, Israel dannyyellin@gmail.com <sup>2</sup> Technion, Haifa, Israel sgailw@cs.technion.ac.il

**Abstract.** We present an algorithm for extracting a subclass of the context free grammars (CFGs) from a trained recurrent neural network (RNN). We develop a new framework, *pattern rule sets* (PRSs), which describe sequences of deterministic finite automata (DFAs) that approximate a non-regular language. We present an algorithm for recovering the PRS behind a sequence of such automata, and apply it to the sequences of automata extracted from trained RNNs using the L<sup>∗</sup> algorithm. We then show how the PRS may converted into a CFG, enabling a familiar and useful presentation of the learned language.

Extracting the learned language of an RNN is important to facilitate understanding of the RNN and to verify its correctness. Furthermore, the extracted CFG can augment the RNN in classifying correct sentences, as the RNN's predictive accuracy decreases when the recursion depth and distance between matching delimiters of its input sequences increases.

**Keywords:** Model Extraction · Learning Context Free Grammars · Finite State Machines · Recurrent Neural Networks

### **1 Introduction**

Recurrent Neural Networks (RNNs) are a class of neural networks adapted to sequential input, enjoying wide use in a variety of sequence processing tasks. Their internal process is opaque, prompting several works into extracting interpretable rules from them. Existing works focus on the extraction of deterministic or weighted finite automata (DFAs and WFAs) from trained RNNs [18,6,26,3].

However, DFAs are insufficient to fully capture the behavior of RNNs, which are known to be theoretically Turing-complete [20], and for which there exist architecture variants such as LSTMs [14] and features such as stacks [9,23] or attention [4] increasing their practical power. Several recent investigations explore the ability of different RNN architectures to learn Dyck, counter, and other non-regular languages [19,5,28,21], with mixed results.

While the data indicates that RNNs can generalize and achieve high accuracy, they do not learn hierarchical rules, and generalization deteriorates as the length and 'depth' of the input grows [19,5,28]. Sennhauser and Berwick conjecture that © The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 351–369, 2021. https://doi.org/10.1007/978-3-030-72016-2 19

**Fig. 1.** Overview of steps in algorithm to synthesize the hidden language L

"what the LSTM has in fact acquired is sequential statistical approximation to this solution" instead of "the 'perfect' rule-based solution" [19]. Similarly, Yu et. al. conclude that "the RNNs can not truly model CFGs, even when powered by the attention mechanism" [28]. This is line with Hewitt et. al., who note that a fixed precision RNN can only learn a language of fixed depth strings [13].

Goal of this paper We wish to extract a CFG from a trained RNN. In particular, we wish to find the CFG that not only explains the finite language learnt by the RNN, but generalizes it to strings of unbounded depth and distance.

Our approach Our method builds on the DFA extraction work of Weiss et al. [26], which uses the L<sup>∗</sup> algorithm [2] to learn the DFA of a given RNN. As part of the learning process, L<sup>∗</sup> creates a sequence of hypothesis DFAs approximating the target language. Our main insight is in treating these hypothesis DFAs as coming from a set of underlying rules, that recursively improve each DFA's approximation of the target CFG by increasing the distance and embedded depth of the sequences it can recognize. In this light, synthesizing the target CFG becomes the problem of recovering these rules.

We propose the framework of pattern rule sets (PRSs) for describing such rule applications, and present an algorithm for recovering a PRS from a sequence of DFAs. We also provide a method for converting a PRS to a CFG, and test our method on RNNs trained on several PRS languages. Pattern rule sets are expressive enough to cover several variants of the Dyck languages, which are prototypical context-free languages (CFLs): the Chomsky–Sch¨utzenberger representation theorem shows that any CFL can be expressed as a homomorphic image of a Dyck language intersected with a regular language[16].

A significant issue we address is that the extracted DFAs are often inexact, either through inaccuracies in the RNN, or as an artifact of the L<sup>∗</sup> algorithm.

To the best of our knowledge, this is the first work on synthesizing a CFG from a general RNN (though some works extract push-down automata [23,9] from RNNs with an external stack, they do not apply to plain RNNs). The overall steps in our technique are given in Figure 1.

Contributions The main contributions of this paper are:


**–** An implementation of our technique<sup>1</sup>, and an evaluation of its success on recovering various CFLs from trained RNNs.

### **2 Definitions and Notations**

#### **2.1 Deterministic Finite Automata**

**Definition 1 (Deterministic Finite Automata).** A deterministic finite automaton (DFA) over an alphabet <sup>Σ</sup> is a 5-tuple Σ,q0, Q, F, δ such that <sup>Q</sup> is a finite set of states, <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> is the initial state, <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> is a set of final (accepting) states and <sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>Q</sup> is a (possibly partial) transition function.

Unless stated otherwise, we assume each DFA's states are unique to itself, i.e., for any two DFAs A, B – including two instances of the same DFA – <sup>Q</sup><sup>A</sup> <sup>∩</sup>Q<sup>B</sup> <sup>=</sup> <sup>∅</sup>. A DFA A is said to be complete if δ is complete, i.e., the value δ(q, σ) is defined for every q, σ <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup>. Otherwise, it is incomplete.

We define the extended transition function <sup>ˆ</sup><sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup><sup>∗</sup> <sup>→</sup> <sup>Q</sup> and the language L(A) accepted by A in the typical fashion. We also associate a language with intermediate states of <sup>A</sup>: <sup>L</sup>(A, q1, q2) {<sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> <sup>|</sup> <sup>ˆ</sup>δ(q1, w) = <sup>q</sup>2}. The states from which no sequence <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> is accepted are known as the sink reject states.

**Definition 2.** The sink reject states of a DFA <sup>A</sup> <sup>=</sup> Σ,q0, Q, F, δ are the maximal set <sup>Q</sup><sup>R</sup> <sup>⊆</sup> <sup>Q</sup> satisfying: <sup>Q</sup><sup>R</sup> <sup>∩</sup> <sup>F</sup> <sup>=</sup> <sup>∅</sup>, and for every <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>R</sup> and <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>, either <sup>δ</sup>(q, σ) <sup>∈</sup> <sup>Q</sup><sup>R</sup> or <sup>δ</sup>(q, σ) is not defined.

**Definition 3 (Defined Tokens).** Let <sup>A</sup> <sup>=</sup> Σ,q0, Q, F, δ be a complete DFA with sink reject states <sup>Q</sup>R. For every <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, its defined tokens are def(A, q) {<sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> <sup>|</sup> <sup>δ</sup>(q, σ) <sup>∈</sup>/ <sup>Q</sup>R}. When the DFA <sup>A</sup> is clear from context, we write def(q).

All definitions for complete DFAs are extended to incomplete DFAs A by considering their completion - an extension of A in which all missing transitions are connected to a (possibly new) sink reject state.

**Definition 4 (Set Representation of** δ**).** A (possibly partial) transition function <sup>δ</sup> : <sup>Q</sup>×<sup>Σ</sup> <sup>→</sup> <sup>Q</sup> may be equivalently defined as the set <sup>S</sup><sup>δ</sup> <sup>=</sup> {(q, σ, q ) <sup>|</sup> <sup>δ</sup>(q, σ) = q }. We use <sup>δ</sup> and <sup>S</sup><sup>δ</sup> interchangeably.

**Definition 5 (Replacing a State).** For a transition function <sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>Q</sup>, state <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, and new state <sup>q</sup><sup>n</sup> <sup>∈</sup>/ <sup>Q</sup>, we denote by <sup>δ</sup>[q←qn] : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>Q</sup> the transition function over <sup>Q</sup> = (<sup>Q</sup> \ {q}) ∪ {qn} and <sup>Σ</sup> that is identical to <sup>δ</sup> except that it redirects all transitions into or out of q to be into or out of qn.

<sup>1</sup> The implementation for this paper, and a link to all trained RNNs, is available at https://github.com/tech-srl/RNN to PRS CFG.

#### **2.2 Dyck Languages**

A Dyck language of order N is expressed by the grammar D ::= ε | L<sup>1</sup> D R<sup>1</sup> | ... | L<sup>N</sup> D R<sup>N</sup> |DD with unique symbols L1,...,L<sup>N</sup> ,D1,...,D<sup>N</sup> . A common measure of complexity for a Dyck word is its maximum distance (number of characters) between matching delimiters and embedded depth (number of unclosed delimiters) [19]. We generalize and refer to Regular Expression Dyck (RE-Dyck) languages as languages expressed by the same CFG, except that each L<sup>i</sup> and each R<sup>i</sup> derive some regular expression.

We present regular expressions as is standard, for example: <sup>L</sup>({a|b}·c) {ac,bc}.

### **3 Patterns**

Patterns are DFAs with a single exit state q<sup>X</sup> in place of a set of final states, and with no cycles on their initial or exit states unless q<sup>0</sup> = qX.

**Definition 6 (Patterns).** <sup>A</sup> pattern <sup>p</sup> <sup>=</sup> Σ,q0, Q, qX, δ is a DFA <sup>A</sup><sup>p</sup> <sup>=</sup> Σ,q0, Q, {qX}, δ, satisfying: 1. <sup>L</sup>(A<sup>p</sup>) <sup>=</sup> <sup>∅</sup>, and 2. either <sup>q</sup><sup>0</sup> <sup>=</sup> <sup>q</sup>X, or def(qX) = <sup>∅</sup> and <sup>L</sup>(A, q0, q0) = {ε}. If <sup>q</sup><sup>0</sup> <sup>=</sup> <sup>q</sup><sup>X</sup> then <sup>p</sup> is called circular, otherwise, it is noncircular. Patterns are always given in minimal incomplete presentation.

We refer to a pattern's initial and exit states as its edge states. All the definitions for DFAs apply to patterns through A<sup>p</sup>. We denote each pattern p's language L<sup>p</sup> L(p), and if it is marked by some superscript i, we refer to all of its components with superscript <sup>i</sup>: <sup>p</sup><sup>i</sup> <sup>=</sup> Σ,q<sup>i</sup> 0, Q<sup>i</sup> , q<sup>i</sup> X, δ<sup>i</sup> .

### **3.1 Pattern Composition**

We can compose two non-circular patterns p1, p<sup>2</sup> by merging the exit state of p<sup>1</sup> with the initial state of <sup>p</sup>2, creating a new pattern <sup>p</sup><sup>3</sup> satisfying <sup>L</sup>p<sup>3</sup> <sup>=</sup> <sup>L</sup>p<sup>1</sup> ·Lp<sup>2</sup> .

**Definition 7 (Serial Composition).** Let p1, p<sup>2</sup> be two non-circular patterns. Their serial composite is the pattern <sup>p</sup><sup>1</sup> ◦ <sup>p</sup><sup>2</sup> <sup>=</sup> Σ,q<sup>1</sup> 0, Q, q<sup>2</sup> <sup>X</sup>, δ in which <sup>Q</sup> <sup>=</sup> <sup>Q</sup><sup>1</sup> <sup>∪</sup> <sup>Q</sup><sup>2</sup> \ {q<sup>1</sup> <sup>X</sup>} and <sup>δ</sup> <sup>=</sup> <sup>δ</sup><sup>1</sup> [q<sup>1</sup> X←q<sup>2</sup> <sup>0</sup>] <sup>∪</sup> <sup>δ</sup><sup>2</sup>. We call <sup>q</sup><sup>2</sup> <sup>0</sup> the join state of this operation.

If we additionally merge the exit state of p<sup>2</sup> with the initial state of p1, we obtain a circular pattern p which we call the circular composition of p<sup>1</sup> and p2. This composition satisfies <sup>L</sup><sup>p</sup> <sup>=</sup> {L<sup>p</sup><sup>1</sup> ·L<sup>p</sup><sup>2</sup> }<sup>∗</sup>.

**Definition 8 (Circular Composition).** Let p1, p<sup>2</sup> be two non-circular patterns. Their circular composite is the circular pattern <sup>p</sup><sup>1</sup> ◦<sup>c</sup> <sup>p</sup><sup>2</sup> <sup>=</sup> Σ,q<sup>1</sup> 0, Q, q<sup>1</sup> <sup>0</sup>, δ in which <sup>Q</sup> <sup>=</sup> <sup>Q</sup><sup>1</sup> <sup>∪</sup> <sup>Q</sup><sup>2</sup> \ {q<sup>1</sup> X, q<sup>2</sup> <sup>X</sup>} and <sup>δ</sup> <sup>=</sup> <sup>δ</sup><sup>1</sup> [q<sup>1</sup> X←q<sup>2</sup> <sup>0</sup>] <sup>∪</sup> <sup>δ</sup><sup>2</sup> [q<sup>2</sup> X←q<sup>1</sup> 0] . We call q<sup>2</sup> <sup>0</sup> the join state of this operation.

Figure 2 shows 3 examples of serial and circular compositions of patterns.

Patterns do not carry information about whether or not they have been composed from other patterns. We maintain such information using pattern pairs.

**Fig. 2.** Examples of the composition operator

**Definition <sup>9</sup> (Pattern Pair).** <sup>A</sup> pattern pair is a pair P, Pc of pattern sets, such that <sup>P</sup><sup>c</sup> <sup>⊂</sup> <sup>P</sup> and for every <sup>p</sup> <sup>∈</sup> <sup>P</sup><sup>c</sup> there exists exactly one pair <sup>p</sup>1, p<sup>2</sup> <sup>∈</sup> <sup>P</sup> satisfying <sup>p</sup> <sup>=</sup> <sup>p</sup><sup>1</sup> <sup>0</sup> <sup>p</sup><sup>2</sup> for some 0 ∈ {◦, ◦c}. We refer to the patterns <sup>p</sup> <sup>∈</sup> <sup>P</sup><sup>c</sup> as the composite patterns of P, Pc, and to the rest as its base patterns.

We will often discuss patterns that have been composed into larger DFAs.

**Definition 10 (Pattern Instances).** Let <sup>A</sup> <sup>=</sup> Σ,q<sup>A</sup> <sup>0</sup> , QA, F, δ<sup>A</sup> be a DFA, <sup>p</sup> <sup>=</sup> Σ,q0, Q, qX, δ be a pattern, and <sup>p</sup><sup>ˆ</sup> <sup>=</sup> Σ,q 0, Q , q X, δ be a pattern 'inside' <sup>A</sup>, i.e., <sup>Q</sup> <sup>⊆</sup> <sup>Q</sup><sup>A</sup> and <sup>δ</sup> <sup>⊆</sup> <sup>δ</sup><sup>A</sup>. We say that <sup>p</sup><sup>ˆ</sup> is an instance of <sup>p</sup> in <sup>A</sup> if <sup>p</sup><sup>ˆ</sup> is isomorphic to p.

A pattern instance in a DFA A is uniquely determined by its structure and initial state: (p, q). If p is a composite pattern with respect to some pattern pair P, Pc, the join state of its composition within <sup>A</sup> is also uniquely defined.

**Definition 11.** For every pattern pair P, Pc, for each composite pattern <sup>p</sup> <sup>∈</sup> <sup>P</sup>c, DFA A, and initial state q of an instance pˆ of p in A, join(p, q, A) returns the join state of <sup>p</sup><sup>ˆ</sup> with respect to its composition in P, Pc.

### **4 Pattern Rule Sets**

For any infinite sequence <sup>S</sup> <sup>=</sup> <sup>A</sup>1, A2, ... of DFAs satisfying <sup>L</sup>(Ai) <sup>⊂</sup> <sup>L</sup>(Ai+1), for all i, we define the language of S as the union of the languages of all these DFAs: <sup>L</sup>(S) = <sup>∪</sup>iL(Ai). Such sequences may be used to express CFLs.

In this work we take a finite sequence A1, A2, ..., A<sup>n</sup> of DFAs, and assume it is a (possibly noisy) finite prefix of an infinite sequence of approximations for a language, as above. We attempt to reconstruct the language by guessing how the

sequence may continue. To allow such generalization, we must make assumptions about how the sequence is generated. For this we introduce pattern rule sets.

Pattern rule sets (PRSs) create sequences of DFAs with a single accepting state. Each PRS is built around a pattern pair P, Pc, and each rule application connects a new pattern instance to the current DFA Ai, at the join state of a composite-pattern inserted into A<sup>i</sup> at some earlier point. To define where a pattern can be connected to <sup>A</sup>i, we introduce an enabled instance set <sup>I</sup>.

**Definition 12.** An enabled DFA over a pattern pair P, Pc is a tuple A, I such that <sup>A</sup> <sup>=</sup> Σ,q0, Q, F, δ is a DFA and I ⊆ <sup>P</sup><sup>c</sup> <sup>×</sup> <sup>Q</sup> marks enabled instances of composite patterns in A.

Intuitively, for every enabled DFA A, I and (p, q) ∈ I, we know: (i) there is an instance of pattern p in A starting at state q, and (ii) this instance is enabled; i.e., we may connect new pattern instances to its join state join(p, q, A).

**Definition 13.** A PRS **<sup>P</sup>** is a tuple Σ, P, Pc, R where P, Pc is a pattern pair over the alphabet Σ and R is a set of rules. Each rule has one of the following forms, for some p, p1, p2, p3, p<sup>I</sup> <sup>∈</sup> <sup>P</sup>, with <sup>p</sup><sup>1</sup> and <sup>p</sup><sup>2</sup> non-circular:

(1) <sup>⊥</sup> <sup>p</sup><sup>I</sup> (2) <sup>p</sup> <sup>c</sup> (p<sup>1</sup> <sup>0</sup> <sup>p</sup>2)◦<sup>=</sup> <sup>p</sup><sup>3</sup>, where <sup>p</sup> <sup>=</sup> <sup>p</sup><sup>1</sup> <sup>0</sup> <sup>p</sup><sup>2</sup> for 0 ∈ {◦, ◦c}, and <sup>p</sup><sup>3</sup> is circular (3) <sup>p</sup> <sup>s</sup> (p<sup>1</sup> ◦ <sup>p</sup>2)◦<sup>=</sup> <sup>p</sup><sup>3</sup>, where <sup>p</sup> <sup>=</sup> <sup>p</sup><sup>1</sup> ◦ <sup>p</sup><sup>2</sup> and <sup>p</sup><sup>3</sup> is non-circular

A PRS derives sequences of enabled DFAs as follows: first, a rule of type (1) creates A1, <sup>I</sup>1 according to <sup>p</sup><sup>I</sup> . Then, for every Ai, <sup>I</sup>i, each rule may connect a new pattern instance to <sup>A</sup>i, specifically at a state determined by <sup>I</sup>i.

**Definition 14 (Initial Composition).** <sup>D</sup><sup>1</sup> <sup>=</sup> A1, <sup>I</sup>1 is generated from a rule <sup>⊥</sup> <sup>p</sup><sup>I</sup> as follows: <sup>A</sup><sup>1</sup> <sup>=</sup> <sup>A</sup>p<sup>I</sup> , and <sup>I</sup><sup>i</sup> <sup>=</sup> {(p<sup>I</sup> , q<sup>I</sup> <sup>0</sup>)} if <sup>p</sup><sup>I</sup> <sup>∈</sup> <sup>P</sup><sup>c</sup> and otherwise <sup>I</sup><sup>1</sup> <sup>=</sup> <sup>∅</sup>.

Let <sup>D</sup><sup>i</sup> <sup>=</sup> Ai, <sup>I</sup>i be the enabled DFAat step <sup>i</sup> and denote <sup>A</sup><sup>i</sup> <sup>=</sup> Σ,q0, Q, F, δ. Note that for <sup>A</sup>1, <sup>|</sup>F<sup>|</sup> = 1, and for all <sup>A</sup>i+1, <sup>F</sup> is unchanged (by future definitions).

Rules of type (1) extend A<sup>i</sup> by grafting a circular pattern to q0, and then enabling that pattern if it is composite.

**Definition 15 (Rules of type** (1)**).** A rule <sup>⊥</sup> <sup>p</sup><sup>I</sup> with circular <sup>p</sup><sup>I</sup> may extend Ai, <sup>I</sup>i at the initial state <sup>q</sup><sup>0</sup> of <sup>A</sup><sup>i</sup> iff def(q0)∩def(q<sup>I</sup> <sup>0</sup>) = ∅. This creates the DFA <sup>A</sup>i+1 <sup>=</sup> Σ,q0, Q∪Q<sup>I</sup> \{q<sup>I</sup> 0}, F, δ∪δ<sup>I</sup> [q<sup>I</sup> <sup>0</sup>←q0] . If <sup>p</sup><sup>I</sup> <sup>∈</sup> <sup>P</sup><sup>c</sup> then <sup>I</sup>i+1 <sup>=</sup> <sup>I</sup>i∪{(p<sup>I</sup> , q0)}, else Ii+1 = Ii.

Rules of type (2) graft a circular pattern <sup>p</sup><sup>3</sup> <sup>=</sup> Σ,q<sup>3</sup> 0, q<sup>3</sup> <sup>x</sup>, F, δ<sup>3</sup> onto the join state q<sup>j</sup> of an enabled pattern instance pˆ in Ai, by merging q<sup>3</sup> <sup>0</sup> with q<sup>j</sup> . In doing so, they also enable the patterns composing ˆp, if they are composite.

**Definition 16 (Rules of type** (2)**).** A rule <sup>p</sup> <sup>c</sup> (p<sup>1</sup> <sup>0</sup> <sup>p</sup><sup>2</sup>)◦<sup>=</sup> <sup>p</sup><sup>3</sup> may extend Ai, <sup>I</sup>i at the join state <sup>q</sup><sup>j</sup> <sup>=</sup> join(p, q, Ai) of any instance (p, q) ∈ Ii, provided def(q<sup>j</sup> ) <sup>∩</sup> def(q<sup>3</sup> <sup>0</sup>) = <sup>∅</sup>. This creates Ai+1, <sup>I</sup>i+1 as follows: <sup>A</sup>i+1 <sup>=</sup> Σ,q0, Q <sup>∪</sup> <sup>Q</sup><sup>3</sup> \ <sup>q</sup><sup>3</sup> <sup>0</sup>, F, δ <sup>∪</sup> <sup>δ</sup><sup>3</sup> [q<sup>3</sup> <sup>0</sup>←q<sup>j</sup> ] , and <sup>I</sup>i+1 <sup>=</sup> <sup>I</sup><sup>i</sup> ∪ {(pk, q<sup>k</sup>) <sup>|</sup> <sup>p</sup><sup>k</sup> <sup>∈</sup> <sup>P</sup>c, k ∈ {1, <sup>2</sup>, <sup>3</sup>}}, where q<sup>1</sup> = q and q<sup>2</sup> = q<sup>3</sup> = q<sup>j</sup> .

**Fig. 3.** Structure of DFA after applying rule of type 2 or type 3

Example applications of rule (2) are shown in Figures 3(i) and 3(ii).

We also wish to graft a non-circular pattern p<sup>3</sup> between p<sup>1</sup> and p<sup>2</sup>, but this time we must avoid connecting the exit state q<sup>3</sup> <sup>X</sup> to <sup>q</sup><sup>j</sup> lest we loop over <sup>p</sup><sup>3</sup> multiple times. We therefore replicate the outgoing transitions of <sup>q</sup><sup>j</sup> in <sup>p</sup><sup>1</sup> ◦ <sup>p</sup><sup>2</sup> to the inserted state q<sup>3</sup> <sup>X</sup> so that they may act as the connections back into the DFA.

**Definition 17 (Rules of type** (3)**).** A rule <sup>p</sup> <sup>s</sup> (p<sup>1</sup> ◦ <sup>p</sup><sup>2</sup>)◦<sup>=</sup> <sup>p</sup><sup>3</sup> may extend Ai, <sup>I</sup>i at the join state <sup>q</sup><sup>j</sup> <sup>=</sup> join(p, q, Ai) of any instance (p, q) ∈ Ii, provided def(q<sup>j</sup> ) <sup>∩</sup> def(q<sup>3</sup> <sup>0</sup>) = <sup>∅</sup>. This creates Ai+1, <sup>I</sup>i+1 as follows: <sup>A</sup>i+1 <sup>=</sup> Σ,q0, Q <sup>∪</sup> <sup>Q</sup><sup>3</sup> \ <sup>q</sup><sup>3</sup> <sup>0</sup>, F, δ <sup>∪</sup> <sup>δ</sup><sup>3</sup> [q<sup>3</sup> <sup>0</sup>←q<sup>j</sup> ] <sup>∪</sup> <sup>C</sup> where <sup>C</sup> <sup>=</sup> { (q<sup>3</sup> <sup>X</sup>, σ, δ(q<sup>j</sup> , σ))<sup>|</sup> <sup>σ</sup> <sup>∈</sup> def(p2, q<sup>2</sup> <sup>0</sup>)}, and <sup>I</sup>i+1 <sup>=</sup> <sup>I</sup><sup>i</sup> ∪ {(pk, q<sup>k</sup>) <sup>|</sup> <sup>p</sup><sup>k</sup> <sup>∈</sup> <sup>P</sup>c, k ∈ {1, <sup>2</sup>, <sup>3</sup>}} where <sup>q</sup><sup>1</sup> <sup>=</sup> <sup>q</sup> and <sup>q</sup><sup>2</sup> <sup>=</sup> <sup>q</sup><sup>3</sup> <sup>=</sup> <sup>q</sup><sup>j</sup> .

We call C the connecting transitions. We depict this rule application in example in Fig. 3 (iii), in which a member of C is labeled 'c'.

Multiple applications of rules of type (3) to the same instance pˆ will create several equivalent states in the resulting DFAs, as all of their exit states will have the same connecting transitions. These states are merged in a minimized representation, as depicted in Diagram (iv) of Figure 3.

We write <sup>A</sup> <sup>∈</sup> <sup>G</sup>(**P**) if there exists a sequence of enabled DFAs derived from **P** s.t. A = A<sup>i</sup> for some A<sup>i</sup> in this sequence.

**Definition 18 (Language of a PRS).** The language of a PRS **P** is the union of the languages of the DFAs it can generate: <sup>L</sup>(**P**) = <sup>∪</sup><sup>A</sup>∈G(**P**)L(A).

#### **4.1 Examples**

Example 1: Let p<sup>1</sup> and p<sup>2</sup> be the patterns accepting 'a' and 'b' respectively. Consider the PRS <sup>R</sup>ab with rules, <sup>⊥</sup> <sup>p</sup><sup>1</sup> ◦ <sup>p</sup><sup>2</sup> and <sup>p</sup><sup>1</sup> ◦ <sup>p</sup><sup>2</sup> <sup>s</sup> (p<sup>1</sup> ◦ <sup>p</sup><sup>2</sup>)◦<sup>=</sup> (p<sup>1</sup> ◦ <sup>p</sup><sup>2</sup>). This PRS creates only one sequence of DFAs. Once the first rule creates the initial DFA, by continuously applying the second rule we obtain the infinite sequence of DFAs each satisfying <sup>L</sup>(Ai) = {a<sup>j</sup> <sup>b</sup><sup>j</sup> : 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>i</sup>}, and so <sup>L</sup>(Rab) = {a<sup>i</sup> <sup>b</sup><sup>i</sup> : i > <sup>0</sup>}. Figure 2(i) presents A1, while A<sup>2</sup> and A<sup>3</sup> appear in Figure 4(i). We can substitute any non-circular patterns for <sup>p</sup><sup>1</sup> and <sup>p</sup>2, creating the language {x<sup>i</sup> <sup>y</sup><sup>i</sup> : i > <sup>0</sup>} for any non-circular pattern regular expressions x and y.

**Fig. 4.** DFA sequences for Rab and RDyck<sup>2</sup>

Example 2: Let p1,p2,p4, and p<sup>5</sup> be the non-circular patterns accepting '(', ')', '[', and ']' respectively. Let <sup>p</sup><sup>3</sup> <sup>=</sup> <sup>p</sup><sup>1</sup> ◦<sup>c</sup> <sup>p</sup><sup>2</sup> and <sup>p</sup><sup>6</sup> <sup>=</sup> <sup>p</sup><sup>4</sup> ◦<sup>c</sup> <sup>p</sup>5. Let <sup>R</sup>Dyck<sup>2</sup> be the PRS containing rules <sup>⊥</sup> <sup>p</sup><sup>3</sup>, <sup>⊥</sup> <sup>p</sup><sup>6</sup>, <sup>p</sup><sup>3</sup> <sup>c</sup> (p<sup>1</sup> ◦<sup>c</sup> <sup>p</sup><sup>2</sup>)◦<sup>=</sup> <sup>p</sup><sup>3</sup>, <sup>p</sup><sup>3</sup> <sup>c</sup> (p<sup>1</sup> ◦<sup>c</sup> <sup>p</sup><sup>2</sup>)◦<sup>=</sup> <sup>p</sup><sup>6</sup>, <sup>p</sup><sup>6</sup> <sup>c</sup> (p<sup>4</sup> ◦<sup>c</sup> <sup>p</sup><sup>5</sup>)◦<sup>=</sup> <sup>p</sup><sup>3</sup>, and <sup>p</sup><sup>6</sup> <sup>c</sup> (p<sup>4</sup> ◦<sup>c</sup> <sup>p</sup><sup>5</sup>)◦<sup>=</sup> <sup>p</sup><sup>6</sup>. <sup>R</sup>Dyck<sup>2</sup> defines the Dyck language of order 2. Figure 4 (ii) shows one of its possible DFA-sequences.

### **5 PRS Inference Algorithm**

A PRS can generate a sequence of DFAs defining, in the limit, a context-free language. We are now interested in inverting this process: given a sequence of DFAs generated by a PRS **P**, can we reconstruct **P**? Coupled with an L<sup>∗</sup> extraction of DFAs from a trained RNN, solving this problem will enable us to extract a PRS from an RNN – provided the extraction follows a PRS (as we often find it does).

We present an algorithm for this problem, and show its correctness. In practice the DFAs we are given are not "perfect"; they contain noise that deviates from the PRS. We therefore augment this algorithm, allowing it to operate smoothly even on imperfect DFA sequences created from RNN extraction.

In the following, for each pattern instance pˆ in Ai, we denote by p the pattern that it is an instance of. We use similar notation pˆ<sup>1</sup>, pˆ<sup>2</sup>, and pˆ<sup>I</sup> to refer to specific instances of patterns p1, p<sup>2</sup> and p<sup>I</sup> . Additionally, for each consecutive DFA pair A<sup>i</sup> and Ai+1, we refer by ˆp<sup>3</sup> to the new pattern instance in Ai+1.

Main steps of inference algorithm. Given a sequence of DFAs <sup>S</sup> <sup>=</sup> <sup>A</sup><sup>1</sup> ··· <sup>A</sup>n, the algorithm infers **<sup>P</sup>** <sup>=</sup> Σ, P, Pc, R in the following stages:

1. Discover the initial pattern instance pˆ<sup>I</sup> in A1. Insert p<sup>I</sup> into P and mark pˆ<sup>I</sup> as enabled. Insert the rule ⊥ → <sup>p</sup><sup>I</sup> into <sup>R</sup>.

	- (a) Discover the new pattern instance ˆp<sup>3</sup> in Ai+1 that extends Ai.
	- (b) If pˆ<sup>3</sup> starts at the state q<sup>0</sup> of Ai+1, then it is an application of a rule of type (1). Insert <sup>p</sup><sup>3</sup> into <sup>P</sup>, mark ˆp<sup>3</sup> as enabled, and add <sup>⊥</sup> <sup>p</sup><sup>3</sup> to <sup>R</sup>.
	- (c) Otherwise (pˆ<sup>3</sup> does not start at q0), find the unique enabled pattern <sup>p</sup><sup>ˆ</sup> <sup>=</sup> <sup>p</sup>ˆ<sup>1</sup> <sup>0</sup> <sup>p</sup>ˆ<sup>2</sup> in <sup>A</sup><sup>i</sup> s.t. <sup>p</sup>ˆ3's initial state <sup>q</sup> is the join state of <sup>p</sup>ˆ. Add <sup>p</sup>1, p2, and p<sup>3</sup> to P, p to Pc, and mark pˆ<sup>1</sup>,pˆ<sup>2</sup>, and pˆ<sup>3</sup> as enabled. If pˆ<sup>3</sup> is noncircular, add <sup>p</sup> <sup>s</sup> (p<sup>1</sup> ◦ <sup>p</sup>2)◦<sup>=</sup> <sup>p</sup><sup>3</sup> to <sup>R</sup>; otherwise add <sup>p</sup> <sup>c</sup> (p<sup>1</sup> <sup>0</sup> <sup>p</sup>2)◦<sup>=</sup> <sup>p</sup>3.

We now elaborate on how we determine the patterns ˆp<sup>I</sup> , ˆp3, and ˆp.

**Discovering new patterns** *p***ˆ***<sup>I</sup>* **and** *p***ˆ<sup>3</sup>** A<sup>1</sup> provides an initial pattern p<sup>I</sup> . For subsequent DFAs, we need to identify which states in <sup>A</sup>i+1 <sup>=</sup> Σ,q 0, Q , F , δ are 'new' relative to <sup>A</sup><sup>i</sup> <sup>=</sup> Σ,q0, Q, F, δ. From the PRS definitions, we know that there is a subset of states and transitions in Ai+1 that is isomorphic to Ai:

**Definition 19.** (Existing states and transitions) For every <sup>q</sup> <sup>∈</sup> <sup>Q</sup> , we say that <sup>q</sup> exists in <sup>A</sup><sup>i</sup> with parallel state <sup>q</sup> <sup>∈</sup> <sup>Q</sup> iff there exists a sequence <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> such that q = ˆδ(q0, w), q = ˆδ (q0, w), and neither is a sink reject state. Additionally, for every q 1, q <sup>2</sup> <sup>∈</sup> <sup>Q</sup> with parallel states <sup>q</sup>1, q<sup>2</sup> <sup>∈</sup> <sup>Q</sup>, we say that (q 1, σ, q <sup>2</sup>) <sup>∈</sup> <sup>δ</sup> exists in <sup>A</sup><sup>i</sup> iff (q1, σ, q2) <sup>∈</sup> <sup>δ</sup>. We denote <sup>A</sup>i+1's existing states and transitions by <sup>Q</sup><sup>E</sup> <sup>⊆</sup> <sup>Q</sup> and <sup>δ</sup><sup>E</sup> <sup>⊆</sup> <sup>δ</sup> , and the new ones as <sup>Q</sup><sup>N</sup> <sup>=</sup> <sup>Q</sup> \ <sup>Q</sup><sup>E</sup> and <sup>δ</sup><sup>N</sup> <sup>=</sup> <sup>δ</sup> \ <sup>δ</sup>E.

By construction of PRSs, each state in Ai+1 has at most one parallel state in Ai, which can be found in one simultaneous traversal of the two DFAs.

The new states and transitions form a new pattern instance pˆ in Ai+1, excluding its initial and possibly its exit state. The initial state of pˆ is the existing state q <sup>s</sup> <sup>∈</sup> <sup>Q</sup><sup>E</sup> that has outgoing new transitions. The exit state <sup>q</sup> <sup>X</sup> of <sup>p</sup><sup>ˆ</sup> is identified by the Exit State Discovery algorithm:


Finally, the new pattern instance is <sup>p</sup> <sup>=</sup> Σ,q s, Qp, q <sup>X</sup>, δp, where <sup>Q</sup><sup>p</sup> <sup>=</sup> <sup>Q</sup><sup>N</sup> <sup>∪</sup> {q s, q <sup>X</sup>} and <sup>δ</sup><sup>p</sup> is the restriction of <sup>δ</sup><sup>N</sup> to the states of <sup>Q</sup>p.

**Discovering the pattern** *p***ˆ (step 2c)** In [27] we show that no two enabled pattern instances in a DFA can share a join state, that if they share any non-edge states, then one is contained in the other, and finally that a pattern's join states is never one of its edge states. This makes finding pˆ straightforward: denoting q<sup>j</sup>

as the parallel of pˆ3's initial state in Ai, we seek the enabled composite pattern instance (p, q) ∈ I<sup>i</sup> for which join(p, q, Ai) = <sup>q</sup><sup>j</sup> . If none is present, we seek the only enabled instance (p, q) ∈ I<sup>i</sup> that contains <sup>q</sup><sup>j</sup> as a non-edge state, but is not yet marked as a composite. (Note that if two enabled instances share a non-edge state, then the containing one is already marked as a composite: otherwise we would not have found and enabled the other).

In [27] we define the concept of a minimal generator and prove the following:

**Theorem 1.** Let A1, A2, ...A<sup>n</sup> be a finite sequence of DFAs that has a minimal generator **P**. Then the PRS Inference Algorithm will discover **P**.

#### **5.1 Deviations from the PRS framework**

Given a sequence of DFAs generated by the rules of PRS **P**, the inference algorithm given above will faithfully infer **P**. In practice however, we want to apply the algorithm to a sequence of DFAs extracted from a trained RNN using the L<sup>∗</sup> algorithm (as in [26]). Such a sequence may contain noise: artifacts from an imperfectly trained RNN, or from the behavior of L<sup>∗</sup> . The major deviations are incorrect pattern creation, simultaneous rule applications, and slow initiation.

Incorrect pattern creation Whether due to inaccuracies in the RNN classification, or as artifacts of the L<sup>∗</sup> process, incorrect patterns are often inserted into the DFA sequence. Fortunately, these patterns rarely repeat, and so we can discern between them and 'legitimate' patterns using a voting and threshold scheme.

The vote for each discovered pattern <sup>p</sup> <sup>∈</sup> <sup>P</sup> is the number of times it has been inserted as the new pattern between a pair of DFAs Ai, Ai+1 in S. We set a threshold for the minimum vote a pattern needs to be considered valid, and only build rules around the connection of valid patterns onto the join states of other valid patterns. To do this, we modify the flow of the algorithm: before discovering rules, we first filter invalid patterns by splitting step 2 into two phases. Phase 1: Mark all the inserted patterns between each pair of DFAs, and compute their votes. Add to P those whose vote is above the threshold. Phase 2: Consider each DFA pair Ai, Ai+1 in order. If the new pattern in Ai+1 is valid, and its initial state's parallel state in A<sup>i</sup> also lies in a valid pattern, then synthesize the rule according to the original algorithm. If a pattern is discovered to be composite, add its composing patterns to P.

As almost every DFA sequence produced by our method has some noise, the voting scheme greatly extended the reach of our algorithm.

Simultaneous rule applications In the theoretical framework, Ai+1 differs from A<sup>i</sup> by applying a single PRS rule, and therefore q <sup>s</sup> and q <sup>X</sup> are uniquely defined. L<sup>∗</sup> however does not guarantee such minimal increments between DFAs. In particular, it may apply multiple PRS rules between two subsequent DFAs, extending A<sup>i</sup> with several patterns. To handle this, we expand the initial and exit state discovery methods given above.

1. Mark the new states and transitions Q<sup>N</sup> and δ<sup>N</sup> as before.


If Ai+1's new patterns have no overlap and do not create an ambiguity around join states, then they may be handled independently and in arbitrary order. They are used to discover rules and then enabled, as in the original algorithm.

Simultaneous but dependent rule applications – such as inserting a pattern and then grafting another onto its join state – are more difficult to handle, as it is not always possible to determine which pattern was grafted onto which. However, there is a special case which appeared in several of our experiments (examples L13 ad L14 of Section 7) for which we developed a technique as follows.

Suppose we discover a rule <sup>r</sup><sup>1</sup> : <sup>p</sup><sup>0</sup> <sup>s</sup> (p<sup>l</sup> ◦ <sup>p</sup>r)◦<sup>=</sup> <sup>p</sup> and <sup>p</sup> contains a cycle <sup>c</sup> around some internal state q<sup>j</sup> . If later another rule inserts a pattern p<sup>n</sup> at the state <sup>q</sup><sup>j</sup> , we understand that <sup>p</sup> is in fact a composite pattern, with <sup>p</sup> <sup>=</sup> <sup>p</sup><sup>1</sup> ◦ <sup>p</sup><sup>2</sup> and join state q<sup>j</sup> . However, as patterns do not contain cycles at their edge states, c cannot be a part of either p<sup>1</sup> or p2. We conclude that the addition of p was in fact a simultaneous application of two rules: r <sup>1</sup> : <sup>p</sup><sup>0</sup> <sup>s</sup> (p<sup>l</sup> ◦ <sup>p</sup>r)◦<sup>=</sup> <sup>p</sup> and <sup>r</sup><sup>2</sup> : <sup>p</sup> <sup>c</sup> (p<sup>1</sup> ◦ <sup>p</sup>2)◦<sup>=</sup> <sup>c</sup>, where <sup>p</sup> is <sup>p</sup> without the cycle <sup>c</sup>, and update our PRS and our DFAs' enabled pattern instances accordingly. The case when p is circular and r<sup>1</sup> is of rule type (2) is handled similarly.

Slow initiation Ideally, <sup>A</sup><sup>1</sup> directly supplies an initial rule <sup>⊥</sup> <sup>p</sup><sup>I</sup> to our PRS. In practice, the first few DFAs generated by L<sup>∗</sup> have almost random structure. We solve this by leaving discovery of the initial rules to the end of the algorithm, at which point we have a set of 'valid' patterns that we are sure are part of the PRS. From there we examine the last DFA A<sup>n</sup> generated in the sequence, note all the enabled instances (p<sup>I</sup> , q0) at its initial state, and generate a rule <sup>⊥</sup> <sup>p</sup><sup>I</sup> for each of them. This technique has the weakness that it will not recognise patterns p<sup>I</sup> that do not also appear as extending patterns p<sup>3</sup> elsewhere in the sequence, unless the threshold for patterns is minimal.

### **6 Converting a PRS to a CFG**

We present an algorithm to convert a given PRS to a context free grammar (CFG), making the rules extracted by our algorithm more accessible.

A restriction: Let **<sup>P</sup>** <sup>=</sup> Σ, P, Pc, R be a PRS. For simplicity, we restrict the PRS so that every pattern p can only appear on the LHS of rules of type (2) or only on the LHS of rules of type (3) but cannot only appear on the LHS of both types of rules. Similarly, we assume that for each rule ⊥→ <sup>p</sup><sup>I</sup> , the RHS patterns p<sup>I</sup> are all circular or non-circular. This restriction is natural: all of the examples

in Sections 4.1 and 7.3 conform to it. Still, in [27] we show how to remove this restriction.

We create a CFG <sup>G</sup> <sup>=</sup> Σ, N, S, P rod. <sup>Σ</sup> is the same alphabet of **<sup>P</sup>** and we take <sup>S</sup> as a special start symbol. For every pattern <sup>p</sup> <sup>∈</sup> <sup>P</sup>, let <sup>G</sup><sup>p</sup> <sup>=</sup> Σp, Np, Zp, P rodp be a CFG describing <sup>L</sup>(p). Let <sup>P</sup><sup>Y</sup> <sup>⊆</sup> <sup>P</sup><sup>C</sup> be those composite patterns that appear on the LHS of a rule of type (2). Create the nonterminal <sup>C</sup><sup>S</sup> and for each <sup>p</sup> <sup>∈</sup> <sup>P</sup><sup>Y</sup> , create an additional non-terminal <sup>C</sup>p. We set <sup>N</sup> <sup>=</sup> {S, CS} {Np} {Cp}.

p∈P p∈P<sup>Y</sup> Let <sup>⊥</sup> <sup>p</sup><sup>I</sup> be a rule in **<sup>P</sup>**. If <sup>p</sup><sup>I</sup> is non-circular, create a production <sup>S</sup> ::= <sup>Z</sup><sup>p</sup><sup>I</sup> . If p<sup>I</sup> is circular, create the productions S ::= S<sup>C</sup> , S<sup>C</sup> ::= S<sup>C</sup> S<sup>C</sup> and S<sup>C</sup> ::= Z<sup>p</sup><sup>I</sup> . For each rule <sup>p</sup> <sup>s</sup> (p<sup>1</sup> ◦ <sup>p</sup>2)◦<sup>=</sup> <sup>p</sup><sup>3</sup> create a production <sup>Z</sup><sup>p</sup> ::= <sup>Z</sup><sup>p</sup>1Z<sup>p</sup>3Z<sup>p</sup><sup>2</sup> . For each rule <sup>p</sup> <sup>c</sup> (p<sup>1</sup> ◦ <sup>p</sup>2)◦<sup>=</sup> <sup>p</sup><sup>3</sup> create productions <sup>Z</sup><sup>p</sup> ::= <sup>Z</sup><sup>p</sup>1CpZ<sup>p</sup><sup>2</sup> , <sup>C</sup><sup>p</sup> ::= <sup>C</sup>pCp, and C<sup>p</sup> ::= Z<sup>p</sup><sup>3</sup> . Let P rod be the all the productions defined by the above process. We set P rod <sup>=</sup> { p∈P P rodp} ∪ P rod .

**Theorem 2.** Let G and **P** be as above. Then L(**P**) = L(G).

The proof is given in the extended version of this paper [27].

Expressibility Every RE-Dyck language (Section 2.2) can be expressed by a PRS, but the converse is not true; RE-Dyck languages nest delimiters arbitrarily, while PRS grammars may not. For instance, language L12 of Section 7.3 is not a Dyck language. Meanwhile, not every CFL can be expressed by a PRS [27].

Succinctness The construction above does not necessarily yield a minimal CFG G. For a PRS defining the Dyck language of order 2 – which can be expressed by a CFG with 4 productions and 1 non-terminal – our construction yields a CFG with 10 non-terminals and 12 productions. In this case, and often in others, we can recognise and remove the spurious productions from the generated grammar.

### **7 Experimental results**

#### **7.1 Methodology**

We test the algorithm on several PRS-expressible context free languages, attempting to extract them from trained RNNs using the process outlined in Figure 1. For each language, we create a probabilistic CFG generating it, train an RNN on samples from this grammar, extract a sequence of DFAs from the RNN, and apply our PRS inference algorithm. Finally, we convert the extracted PRS back to a CFG, and compare it to our target CFG.

In all of our experiments, we use a vote-threshold s.t. patterns with less than 2 votes are not used to form any PRS rules (Section 5.1). Using no threshold significantly degraded the results by including too much noise, while higher thresholds often caused us to overlook correct patterns and rules.

#### **7.2 Generating a sequence of DFAs**

We obtain a sequence of DFAs for a given CFG using only positive samples[11,1] by training a language-model RNN (LM-RNN) on these samples and then extracting DFAs from it with the aid of the L<sup>∗</sup> algorithm [2], as described in [26]. To apply L<sup>∗</sup> we must treat the LM-RNN as a binary classifier. We set an 'acceptance threshold' t and define the RNN's language as the set of sequences s satisfying: 1. the RNN's probability for an end-of-sequence token after s is greater than t, and 2. at no point during s does the RNN pass through a token with probability < t. This is identical to the concept of locally t-truncated support defined in [13].

To create the samples for the RNNs, we write a weighted version of the CFG, in which each non-terminal is given a probability over its rules. We then take N samples from the weighted CFG according to its distribution, split them into train and validation sets, and train an RNN on the train set until the validation loss stops improving. In our experiments, we used N = 10, 000. For our languages, we used very small 2-layer LSTMs: hidden dimension 10 and input dimension 4.

In some cases, especially when all of the patterns in the rules are several tokens long, the extraction of [26] terminates too soon: neither L<sup>∗</sup> nor the RNN abstraction consider long sequences, and equivalence is reached between the L<sup>∗</sup> hypothesis and the RNN abstraction despite neither being equivalent to the 'true' language of the RNN. In these cases we push the extraction a little further using two methods: first, if the RNN abstraction contains only a single state, we make an arbitrary initial refinement by splitting 10 hidden dimensions, and restart the extraction. If this is also not enough, we sample the RNN according to its distribution, in the hope of finding a counterexample to return to L<sup>∗</sup> . The latter approach is not ideal: sampling the RNN may return very long sequences, effectively increasing the next DFA by many rule applications. We place a time limit of 1, 000 seconds (<sup>∼</sup> 17 minutes) on the extraction.

#### **7.3 Languages**

We experiment on 15 PRS-expressible languages <sup>L</sup><sup>1</sup> <sup>−</sup> <sup>L</sup>15, grouped into 3 classes:



**Table 1.** Results of experiments on DFAs extracted from RNNs

#### **7.4 Results**

Table 1 shows the results. The 2nd column shows the number of DFAs extracted from the RNN. The 3rd and 4th columns present the number of patterns found by the algorithm before and after applying vote-thresholding to remove noise. The 5th column gives the minimum and maximum votes received by the final patterns (we count only patterns introduced as a new pattern p<sup>3</sup> in some Ai+1). The 6th column notes whether the algorithm found a correct CFG, according to our manual inspection. For languages where our algorithm only missed or included 1 or 2 valid/invalid productions, we label it as partially correct.

Alternating Patterns Our algorithm struggled on the languages L3, L6, and L11, which contained patterns whose regular expressions had alternations (such as ab|cd in L3, and ab|c in L<sup>6</sup> and L11). Investigating their DFA sequences uncovered the that the L<sup>∗</sup> extraction had 'split' the alternating expressions, adding their parts to the DFAs over multiple iterations. For example, in the sequence generated for L3, ef appeared in A<sup>7</sup> without gh alongside it. The next DFA corrected this mistake but the inference algorithm could not piece together these two separate steps into a single rule. It will be valuable to expand the algorithm to these cases.

Simultaneous Applications Originally our algorithm failed to accurately generate L<sup>13</sup> and L<sup>14</sup> due to simultaneous rule applications. However, using the technique described in Section 5.1 we were able to correctly infer these grammars. However, more work is needed to handle simultaneous rule applications in general.

Additionally, sometimes a very large counterexample was returned to L<sup>∗</sup> , creating a large increase in the DFAs: the 9thiteration of the extraction on L<sup>3</sup> introduced almost 30 new states. The algorithm does not manage to infer anything meaningful from these nested, simultaneous applications.

Missing Rules For the Dyck languages <sup>L</sup>7−L9, the inference algorithm was mostly successful. However, due to the large number of possible delimiter combinations, some patterns and nesting relations did not appear often enough in the DFA

sequences. As a result, for L8, some productions were missing in the generated grammar. L<sup>8</sup> also created one incorrect production due to noise in the sequence (one erroneous pattern was generated two times,passing the threshold).

RNN Noise In L15, the extracted DFAs for some reason always forced that a single character d be included between every pair of delimiters. Our inference algorithm of course maintained this peculiarity. It correctly allowed the allowed optional embedding of "abc" strings. But due to noisy (incorrect) generated DFAs, the patterns generated did not maintain balanced parenthesis.

### **8 Related work**

Training RNNs to recognize Dyck Grammars. Recently there has been a surge of interest in whether RNNs can learn Dyck languages [5,19,21,28]. While these works report very good results on learning the language for sentences of similar distance and depth as the training set, with the exception of [21], they report significantly lower accuracy for out-of-sample sentences.

Among these, Sennhauser and Berwick [19] use LSTMs, and show that in order to keep the error rate within a 5 percent tolerance, the number of hidden units must grow exponentially with the distance or depth of the sequences (though Hewitt et. al. [13] find much lower theoretical bounds). They conclude that LSTMs do not learn rules, but rather statistical approximations. Bernardy [5] experimented with various RNN architectures, finding in particular that the LSTM has more difficulty in predicting closing delimiters in the middle of a sentence than at the end. Based on this, he conjectures that the RNN is using a counting mechanism, but has not truly learnt the Dyck language (its CFG). For the simplified task of predicting only the final closing delimiter of a legal sequence, Skachkova, Trost and Klakow [21] find that LSTMs have nearly perfect accuracy across words with large distances and embedded depth.

Yu, Vu and Kuhn [28] compare the three works above, and note that the task of predicting only the closing bracket of a balanced Dyck word is not sufficient for checking if an RNN has learnt the language, as it can be computed by only a counter. In their experiments, they present a prefix of a Dyck word and train the RNN to predict the next valid closing bracket. They experiment with an LSTM using 4 different models, and show that the generator-attention model [17] performs the best, and is able to generalize quite well at the tagging task . However, they find that it degrades rapidly with out-of-domain tests. They also conclude that RNNs do not really learn the Dyck language. These experimental results are reinforced by the theoretical work in [13], who remark that no finite precision RNN can learn a Dyck language of unbounded depth, and give precise bounds on the memory required to learn a Dyck language of bounded depth.

Despite these findings, our algorithm nevertheless extracts a CFG from a trained RNN, discovering rules based on DFAs synthesized from the RNN using the algorithm in [26]. Because we can use a short sequence of DFAs to extract the rules, and because the first DFAs in the sequence describe Dyck words with

increasing but limited distance and depth, we are often able to extract the CFG perfectly even when the RNN does not generalize well. Moreover, we show that our approach works with more complex types of delimiters, and on Dyck languages with expressions between delimiters.

Extracting DFAs from RNNs. There have been many approaches to extract higher level representations from a neural network (NN), both to facilitate comprehension and to verify correctness. One of the oldest approaches is to extract rules from a NN [24,12]. In particular, several works attempt to extract FSAs from RNNs [18,15,25]. We base our work on [26]. Its ability to generate sequences of DFAs providing increasingly better approximations of the CFL is critical to our method.

There has been less research on extracting a CFG from an RNN. One exception is [23], where they develop a Neural Network Pushdown Automata (NNPDA) framework, a hybrid system augmenting an RNN with external stack memory. They show how to extract a push-down automaton from an NNPDA, however, their technique relies on the PDA-like structure of the inspected architecture. In contrast, we extract CFGs from RNNs without stack augmentation.

Learning CFGs from samples. There is a wide body of work on learning CFGs from samples. An overview is given in [10] and a survey of work for grammatical inference applied to software engineering tasks can be found in [22].

Clark et. al. studies algorithms for learning CFLs given only positive examples [11]. In [7], Clark and Eyraud show how one can learn a subclass of CFLs called CF substitutable languages. There are many languages that can be expressed by a PRS but are not substitutable, such as xnb<sup>n</sup>. However, there are also substitutable languages that cannot be expressed by a PRS (wxw<sup>R</sup> - see [27]). In [8], Clark, Eyraud and Habrard present Contextual Binary Feature Grammars. However, it does not include Dyck languages of arbitrary order. None of these techniques deal with noise in the data, essential to learning a language from an RNN.

### **9 Future Directions**

Currently, for each experiment, we train the RNN on that language and then apply the PRS inference algorithm on a single DFA sequence generated from that RNN. Perhaps the most substantial improvement we can make is to extend our technique to learn from multiple DFA sequences. We can train multiple RNNs and generate DFA sequences for each one. We can then run the PRS inference algorithm on each of these sequences, and generate a CFG based upon rules that are found in a significant number of the runs. This would require care to guarantee that the final rules form a cohesive CFG. It would also address the issue that not all rules are expressed in a single DFA sequence, and that some grammars may have rules that are executed only once per word of the language.

Our work generates CFGs for generalized Dyck languages, but it is possible to generalize PRSs to express a greater range of languages. Work will then be needed to extend the PRS inference algorithm.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Automated and Formal Synthesis of Neural Barrier Certificates for Dynamical Models

Andrea Peruffo<sup>1</sup>(-) , Daniele Ahmed<sup>2</sup> , Alessandro Abate<sup>1</sup>

<sup>1</sup> Department of Computer Science, University of Oxford, Oxford, UK {name.surname}@cs.ox.ac.uk <sup>2</sup> Amazon Inc, London, UK

Abstract. We introduce an automated, formal, counterexample-based approach to synthesise Barrier Certificates (BC) for the safety verification of continuous and hybrid dynamical models. The approach is underpinned by an inductive framework: this is structured as a sequential loop between a learner, which manipulates a candidate BC structured as a neural network, and a sound verifier, which either certifies the candidate's validity or generates counter-examples to further guide the learner. We compare the approach against state-of-the-art techniques, over polynomial and non-polynomial dynamical models: the outcomes show that we can synthesise sound BCs up to two orders of magnitude faster, with in particular a stark speedup on the verification engine (up to three orders less), whilst needing a far smaller data set (up to three orders less) for the learning part. Beyond improvements over the state of the art, we further challenge the new approach on a hybrid dynamical model and on larger-dimensional models, and showcase the numerical robustness of our algorithms and codebase.

### 1 Introduction

Barrier Certificates (BC) are an effective and powerful technique to prove safety properties on models of continuous dynamical systems, as well as hybrid models (featuring both continuous and discrete states) [21,22]. Whenever found, a BC partitions the state space of the model into two parts, ensuring that all trajectories starting from a given initial set, located within one side of the BC, cannot reach a given set of states (deemed to be unsafe), located on the other side. Thus a successful synthesis of a BC (which is in general not a unique object) represents a formal proof of safety for the dynamical model. BC find various applications spanning robotics, multi-agent systems, and biology [7,32].

This work addresses the safety of dynamical systems modelled in general by non-linear ordinary differential equations (ODE), and presents a novel method for the automated and formal synthesis of BC. The approach leverages Satisfiability Modulo Theory (SMT) and inductive reasoning (CEGIS, Figure 1, introduced later), to guarantee the correctness of the automated synthesis procedure: this rules out both algorithmic and numerical errors related to BC synthesis [10].

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 370–388, 2021. https://doi.org/10.1007/978-3-030-72016-2\_20

*Background and Related Work* A few techniques have been developed to synthesise BC. For polynomial models, sum-of-squares (SOS) and semi-definite programming relaxations [14,16,29] convert the BC synthesis problem into constraints expressed as linear or bilinear matrix inequalities: these are numerically solved as a convex optimisation problem, however unsoundly. To increase scalability and to enhance expressiveness, numerous barrier formats have been considered: BC based on exponential conditions are presented in [14]; BC based on Darboux polynomials are outlined in [33]; [30] newly introduces a multi-dimensional generalisation of BC, thus broadening their scope and applicability. BC can also be used to verify safety of uncertain (e.g. parametric) models [20]. Let us remark that SOS approaches are typically *unsound*, namely they rely on iterative and numerical methods to synthesise the BC. [10] a-posteriori verifies SOS candidates via computer-aided design (CAD) techniques [15].

Model *invariants* (namely, regions that provably contain model trajectories, such as *basins of attractions* [28]) can be employed as BC, though their synthesis is less general, as it does not comprise an unsafe set and tacitly presupposes the initial set to be "well placed" within the state space (that is, within the aforementioned basin): [19] introduces a fixpoint algorithm to find algebraic-differential invariants for hybrid models; invariants can be characterised analytically [4] or synthesised computationally [8]. Invariants can be alternatively studied by *Lyapunov theory* [5], which provides *stability* guarantees for dynamical models, and thus can characterise invariants (and barriers) as side products: however this again requires that initial conditions are positioned around stable equilibria, and does not explicitly encompass unsafe sets in the synthesis. Whilst Lyapunov theory is classically approached either analytically (explicit synthesis) or numerically (with unsound techniques), an approach that is relevant for the results of this work looks at automated and sound Lyapunov function synthesis: in [27] Lyapunov functions are soundly found within parametric templates, by constructing a system of linear inequality constraints over unknown coefficients. [23,24,25] employ a counterexample-based approach to synthesise control Lyapunov functions, which inspires this work, using a combination of SMT solvers and convex optimisation engines: however unlike this work, SMT solvers are never used for verification, which is instead handled by solving optimisation problems that are numerically unsound. As argued above, let us emphasise again that the BC synthesis problem, as studied in this work, cannot in general be reduced to a problem of Lyapunov stability analysis, and is indeed more general.

Fig. 1. Schematic representation of the CEGIS loop.

*Core approach* We introduce a method that efficiently exploits machine learning, whilst guaranteeing formal proofs of correctness via SMT. We leverage a CounterExample-Guided Inductive Synthesis (CEGIS) procedure [31], which is structured as an inductive loop between a *Learner* and a *Verifier* (cf. Fig. 1). A learner numerically (and unsoundly) trains a neural network (NN) to fit over a finite set of samples the requirements for a BC, which are expressed through a loss function; then a verifier either formally proves the validity of the BC or provides (a) counter-example(s) through an SMT solver: the counter-examples indicate where the barrier conditions are violated, and are passed back to the learner for further training. This synthesis method for neural BC is formally sound and fully automated, and thanks to its specific new features, is shown to be much faster and to clearly require less data than state-of-the-art results.

*Contributions beyond the State of the Art* Cognate work [34] presents a method to compute BC using neural networks and to verify their correctness a-posteriori: as such, it does not generate counter-examples within an inductive loop, as in this work. [34] considers large sample sets that are randomly divided into batches and fed to a feed-forward NN; the verification at the end of the (rather long) training either validates the candidate, or invalidates it and the training starts anew on the same dataset. In Section 4 the method in [34] is shown to be slower (both in the training and in the verification), and to require more data than the CEGIS-based approach of this work, which furthermore introduces numerous bespoke optimisations, as outlined in Section 3: our CEGIS-based technique exploits fast learning, verification simplified by the candidates passed by the Learner, and an enhanced communication between Learner and Verifier. Our approach further showcases numerical robustness and scalability features.

Related to the work on BC is the synthesis of Lyapunov functions, mentioned above. The construction of *Lyapunov Neural Networks* (LNNs) has been studied with approaches based on simulations and numerical optimisation, which are in general unsound [26]. Formal methods for Lyapunov synthesis are introduced in [5], together with a counterexample-based approach using polynomial candidates. The work is later extended in [2], which employs NN as candidates over polynomial dynamical models. The generation of control Lyapunov functions using counterexample-based NN is similarly considered in [9], however this is done by means of differing architectural details and does not extend to BC synthesis. Beyond the work in [5], this contribution is not limited to a specific polynomial template, since it supports more general mixtures of polynomial functions obtained through the NN structure, as well as the canonical tanh, sigmoid, ReLU activations (we provide one example of BC using tanh activations). Compared to [5], where we use LP programming to synthesise Lyapunov functions, in this work: a) we use a template-free procedure, thanks to the integration of NNs - these are needed since template-based SOS-programming approaches are not sufficient to provide BCs for several of the presented benchmarks (see Section 4 and [34]); b) we provide an enhanced loss function (naturally absent from [5]), enriched counter-example generation, prioritised check of the verification constraints, and c) we newly synthesise verified barrier certificates for hybrid models, which are

generated using counterexample-based, neural architectures. Finally, beyond [5] the new approach is endowed with numerical robustness features.

SOS programming solutions [14,16,29] are not quite comparable to this work. Foremostly, they are not sound, i.e. do not offer a formal guarantee of numerical and algorithmic correctness. The exception is [10], which verifies SOS candidates a-posteriori by means of CAD [15] techniques that are known not to scale well. Furthermore, they can be hardly embedded within a CEGIS loop - we experimentally show that SOS candidates are handled with difficulty by SMT solvers. Finally, they hardly cope with the experiments we have considered, as already observed in [34]. We instead use SMT solvers (Z3 [11] and dReal [13]) within CEGIS to provide sound outcomes based on NN candidates, proffering a new approach that synthesises and formally verifies candidate BCs altogether, with minimum effort from the user.

*Organisation* The remainder of the paper is organised as follows: Section 2 presents preliminary notions on BCs and outlines the problem. Section 3 describes the approach: training of the NN in Sec. 3.1 and verification in Sec. 3.2. Section 4 presents case studies, Section 5 delineates future work.

### 2 Safety Analysis with Barrier Certificates

We address the safety verification of continuous-time dynamical models by designing barrier certificates (BC) over the continuous state space X of the model. We consider n-dimensional dynamical models described by

$$
\dot{x}(t) = \frac{dx}{dt} = f(x), \quad x(0) = x\_0 \in X\_0 \subset X,\tag{1}
$$

where <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> is a continuous vector field, <sup>X</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> is an open set defining the state space of the system, and X<sup>0</sup> represents the set of initial states. Given model (1) and an unsafe set <sup>X</sup><sup>u</sup> <sup>⊂</sup> <sup>X</sup>, the safety verification problem concerns checking whether or not all trajectories of the model originating from X<sup>0</sup> reach the unsafe region Xu. BC offer a sufficient condition asserting the safety of the model, namely when no trajectory enters the unsafe region.

Definition 1. *The Lie derivative of a continuously differentiable scalar function* <sup>B</sup> : <sup>X</sup> <sup>→</sup> <sup>R</sup>*, with respect to a vector field* <sup>f</sup>*, is defined as follows*

$$\dot{B}(x) = \nabla B(x) \cdot f(x) = \sum\_{i=1}^{n} \frac{\partial B}{\partial x\_i} \frac{dx\_i}{dt} = \sum\_{i=1}^{n} \frac{\partial B}{\partial x\_i} f\_i(x). \tag{2}$$

*Intuitively, this derivative denotes the rate of change of function* B *along the model trajectories.*

Proposition 1 (Barrier Certificate for Safety Verification, [21]). *Let the model in* (1) *and the sets* X*,* X<sup>0</sup> *and* X<sup>u</sup> *be given. Suppose there exists a function* <sup>B</sup> : <sup>X</sup> <sup>→</sup> <sup>R</sup> *that is differentiable with respect to its argument and satisfies the following conditions:*

$$B(x) \le 0 \,\,\forall x \in X\_0, \quad B(x) > 0 \,\,\forall x \in X\_u, \quad \dot{B}(x) < 0 \,\,\forall x \in X \,\,\,s.t. \,\,B(x) = 0,\tag{3}$$

*then the safety of the model is guaranteed. That is, there is no trajectory of the model contained in* X*, starting from the initial set* X0*, that ever enters set* Xu*.*

Consider a trajectory <sup>x</sup>(t) starting in <sup>x</sup><sup>0</sup> <sup>∈</sup> <sup>X</sup><sup>0</sup> and the evolution of <sup>B</sup>(x(t)) along this trajectory. Whilst the first of the three conditions guarantees that <sup>B</sup>(x0) <sup>≤</sup> <sup>0</sup>, the last condition asserts that the value of B(x(t)) along a trajectory x(t) must decrease. Hence such a trajectory x(t) cannot enter the set Xu, where B(x) > 0 (second condition), thus ensuring the safety of the model.

### 3 Synthesis of Neural Barrier Certificates via Learning and Verification

We introduce an automated and formal approach for the construction of barrier certificates (BC) that are expressed as feed-forward neural networks (NN). The procedure leverages CEGIS (see Fig. 1) [31], an automated and sound procedure for solving second-order logic synthesis problems, which comprises two interacting parts. The first component is a *Learner*, which provides candidate BC functions by training a NN over a finite set of sample inputs. The network is then translated into a logical formula in an appropriate theory, by evaluating it with symbolic inputs, instead of canonical floating point numbers. The details of this conversion are outlined in [2]. This encoded candidate is passed to the second component, a *Verifier*, which acts as an oracle: either it proves that the solution is valid, or it finds one (or more) instance (called a counter-example) where the candidate BC does not comply with required conditions. The verifier consists of an SMT solver [15], namely an algorithmic decision procedure that extends Boolean SAT problems to richer, more expressive theories, such as non-linear arithmetics.

More precisely, the learner trains a NN composed of n input neurons (this matches the dimension of the model f), k hidden layers, and one output neuron (recall that B(x) is a scalar function): this NN candidate B is required to closely match the conditions in Eq. (3) over a discrete set of samples S, which is initialised randomly. The verifier checks whether the candidate B violates any of the conditions in Eq. (3) over the entire set X and, if so, produces one (or more, as in this work) counter-examples c. We add c to the samples set S as the loop restarts, hence forcing the NN to be trained *also* over the generated counterexamples c. Note that the NN retains its old weights, and restarts the training from the weights obtained at the end of the previous session. This loop repeats until the SMT verifier proves that no counter-examples exist or until a timeout is reached. CEGIS offers a scalable and flexible alternative for BC synthesis: on the one hand, the learner does not require soundness, and ensures a rapid synthesis exploiting the training of NN architectures; on the other, the algorithm is *sound*, i.e. a valid output from the SMT-based verifier is provably correct; of course we

cannot claim any *completeness*, since CEGIS might in general not terminate with a solution because it operates over a continuous model.

The performance of the CEGIS algorithm in practice hinges on the effective exchange of information between the learner and the verifier [3]. A core contribution of this work is to tailor the CEGIS architecture to the problem of BC synthesis: we devise several improvements to NN training, such as a bespoke loss function and a multi-layer NN architecture that ensures robustness and outputs a function that is tailored to the verification engine. Over consecutive loops, the verifier may return similar counter-examples: we thus propose a more informative counter-examples generation by the SMT verifier that is adapted to the candidate BC and the underlying dynamical model. These tailored architectural details generate in practice a rapid, efficient, and robust CEGIS loop, which is shown in this work to clearly outperform state-of-the-art methods.

#### 3.1 Training of the Barrier Neural Network

The learner instantiates the candidate BC using the hyper-parameters k and h (depth and width of the NN), trains it over the N samples in the set S, and later refines its training whenever the verifier adds counter-examples to the set S. The class of candidate BC comprises multi-layered, feed-forward NN with *polynomial* and non-polynomial activation functions. Unlike most learning applications, the choice of polynomial activations comes from the need for interpretable outputs from the NN, whose analytical expression must be readily processed by the verifier. The order γ of the polynomial activations is a hyper-parameter fed at the start of the procedure: we split the i-th hidden layer into γ portions and apply polynomial activations of order j to the neurons of the j-th portion.

*Example 1 (Polynomial Activations).* Assume a NN composed of an input x, 3 hidden neurons and 1 activation-free output, with γ-th order polynomial activation, γ = 3. We split the hidden layer in γ sub-vectors, each containing one neuron. The hidden layer after the activation results in

$$z = \begin{bmatrix} W\_1^{(1)}x + b\_1 & \ & (W\_2^{(1)}x + b\_2)^2 & \ & (W\_3^{(1)}x + b\_3)^3 \end{bmatrix}^T, \ q$$

where the W(1) <sup>i</sup> are the <sup>i</sup>-th row of the first-layer weight matrix, and the <sup>b</sup><sup>i</sup> form the bias vector.

The learning process updates the NN parameters to improve the satisfaction of the BC conditions in (3): <sup>B</sup>(x) <sup>≤</sup> <sup>0</sup> for <sup>x</sup> <sup>∈</sup> <sup>X</sup>0, <sup>B</sup>(x) <sup>&</sup>gt; <sup>0</sup> for <sup>x</sup> <sup>∈</sup> <sup>X</sup>u, and a negative Lie derivative B˙ (Eq. (2)) over the set implicitly defined by B(x)=0. The training minimises a loss comprising three terms, namely

$$L = L\_0 + L\_u + L\_d = \frac{1}{N} \sum\_{i=1}^{N} \left( \max\_{s\_i \in X\_0} \{ \tau\_0, B(s\_i) \} + \max\_{s\_i \in X\_u} \{ \tau\_u, -B(s\_i) \} \right)$$

$$+ \max\_{s\_i: B(s\_i) = 0} \{ \tau\_d, \dot{B}(s\_i) \}, \quad (4)$$

where si, i = 1,...,N are the samples taken from the set S. The constants τ0, τu, τ<sup>d</sup> are offsets, added to improve the numerical stability of the training. Notably, B(x)=0 can be a set with small volume, thus it is highly unlikely that a single sample s will satisfy B(s)=0. We thus relax this last condition and consider a belt <sup>B</sup> around <sup>B</sup>(s)=0, namely <sup>B</sup> <sup>=</sup> <sup>|</sup>B(x)| ≤ <sup>β</sup>, which depends on the hyper-parameter β. Note that we must use continuously differentiable activations throughout, as we require the existence of Lie derivatives (cf. Eq. (2)), and thus cannot leverage simple ReLUs.

*Enhanced Loss Functions* The loss function in Eq. (4) experimentally yields possible drawbacks, which suggests a few ameliorations. Terms L<sup>0</sup> and L<sup>u</sup> solely penalise samples with incorrect value of B(x) without further providing a reward for samples with a correct value. The NN thus stops learning when the samples return correct values of B(x) without further increasing the positivity of B over X<sup>u</sup> or the negativity over X0. As such, the training often returns a candidate B(x) with values just below τ<sup>0</sup> in X<sup>0</sup> or above τ<sup>u</sup> in Xu. These candidates are easily falsified, thus potentially leading to a large number of CEGIS iterations.

We improve the learning by adopting a (saturated) *Leaky* ReLU, hence rewarding samples that evaluate to a correct value of B(x). Noting that

$$\text{LeakyReLU}(\alpha, x) = \text{ReLU}(x) - \alpha \,\text{ReLU}(-x),\tag{5}$$

where α is a small positive constant, we rewrite term L<sup>0</sup> as

$$L\_0 = \frac{1}{N} \sum\_{s\_i \in X\_0} \text{ReLU}(B(s\_i) - \tau\_0) - \alpha \cdot \text{satReLU}(-B(s\_i) + \tau\_0), \tag{6}$$

where satReLU is the saturated ReLU function3. The term L<sup>u</sup> is similarly modified. The composite loss function works as follows. Incorrect samples account for the main contribution to the loss function, leading the NN to correct those first via the ReLU term in Eq. (6). At a second stage, the network finds a direction of improvement by following the *leaky* portion of the loss function. This is saturated to prevent the training from following only one of these directions, without improving the other loss terms.

Another possible drawback of the loss function in (4) derives from the term Ld: it solely accounts for a penalisation of the sample points within B. To quickly and myopically improve the loss function, the training can generate a candidate BC for which no samples are within B - we experimentally find that this behaviour persists, regardless of the value of β. Similarly to L<sup>0</sup> and Lu, we reward the points within a belt fulfilling the BC condition: namely, we solely apply the satReLU function to reward samples s with a negative B˙(s), whilst not penalising values <sup>B</sup>˙(s) <sup>≥</sup> <sup>0</sup>. The training is driven to include more samples in <sup>B</sup>, guiding towards a negative B˙(s), and finally enhancing learning. The expression of L<sup>d</sup> results in

$$L\_d = -\frac{1}{N} \sum\_{s \in \mathcal{B}} \text{satReLU}(-\dot{B}(s) + \tau\_d). \tag{7}$$

<sup>3</sup> Let us define M to be an arbitrary upper bound, then satReLU(x) = min(max(0, x), M).

Finally, we choose an asymmetric belt <sup>B</sup> <sup>=</sup> <sup>−</sup>β<sup>1</sup> <sup>≤</sup> <sup>B</sup>(s) <sup>≤</sup> <sup>β</sup>2, with <sup>β</sup><sup>2</sup> > β<sup>1</sup> <sup>&</sup>gt; <sup>0</sup> to both ensure a wider sample set and a stronger safety certificate.

*Multi-layer Networks* Polynomial activation functions generate interpretable barrier certificates with analytical expressions that are readily verifiable by an SMT solver. However, when considering polynomial networks, the use of multilayer architectures quickly increases the order of the barrier function: a k-layer network with γ-th order activations returns a polynomial of kγ degree. We have experienced that deep NN provide numerical robustness to our method, although the verification complexity increases with the order of the polynomial activation functions used and with the depth of the NN. As a consequence, our procedure leverages a deep architecture whilst maintaining a low-order polynomial by interchanging linear and polynomial activations over adjacent layers. We have observed that the use of linear activations, particularly in the output layer, positively affects the training: they provide robustness that is needed to the synthesis of BC (see Experimental results), without increasing the order of the network with new polynomial terms.

*Learning in Separate Batches* The structure of the conditions in (3) and the learning loss in (4) naturally suggests a separate approach to training. We then split the dataset S into three batches S0, S<sup>u</sup> and Sx, each including samples belonging to X0, X<sup>u</sup> and X, respectively. For training, we compute the loss function in a parallel fashion. Similarly, for the verifier, generated counterexamples are added to the relevant batch.

#### 3.2 Certification of the Barrier Neural Network, or Falsification via Counter-examples

Every candidate BC function B(x) which the learner generates requires to be certified by the verifier. Equivalently, in practice the SMT-based verifier aims at finding states that violate the barrier conditions in (3) over the continuous domain X. To this end, we express the *negation* of such requirements, and formulate a nonlinear constrained problem over real numbers, as

$$(x \in X\_0 \land B(x) > 0) \lor (x \in X\_u \land B(x) \le 0) \lor (B(x) = 0 \land \dot{B}(x) \ge 0). \tag{8}$$

The verifier searches for solutions of the constraints in Eq. (8), which in general requires manipulating non-convex functions. This can be cumbersome and timeconsuming, hence simple expressions of B can enhance the verification procedure. On the one hand, the soundness of our CEGIS procedure heavily relies on the correctness of SMT solving: an SMT solver never fails to assert the absence of solutions for (8). As a result, when it states that formula (8) is unsatisfiable, i.e. returns unsat, B(x) is formally guaranteed to fulfil the BC conditions in Eq. (3). On the other hand, the CEGIS algorithm offers flexibility in the choice of the verifier, hence we implement and discuss two SMT solvers: dReal [13] and Z3 [11]. dReal is a δ-complete solver, namely the unsat decision is correct [12],

whereas when a solution for (8) is found, this comes with a δ-error bound. The value of δ characterises the procedure precision. In our setting, it is acceptable to return spurious counter-examples: indeed, these are then used as additional samples and do not invalidate the sound outcomes of the procedure, but rather help synthesising a more robust barrier candidate. dReal is capable of handling non-polynomial terms, such as exponentials or trigonometric vector fields f for some of the models considered in Section 4. Z3 is a powerful, sound and complete SMT solver, namely its conclusions are provably correct both when it determines the validity of a BC candidate and when it provides counter-examples. The shortcoming of Z3 is that it is unable to handle non-polynomial formulae.

*Prioritisation and Relaxation of Constraints* The effectiveness of the CEGIS framework is underpinned by rapid exchanges between the learner and the verifier, as well as by quick NN training and SMT verification procedures. We have experienced that the bottleneck resides in the handling of the constraint <sup>η</sup><sup>d</sup> = (B(x)=0 <sup>∧</sup> <sup>B</sup>˙(x) <sup>≥</sup> 0) by the SMT solver, since the formula contains the high-order expression B˙(x) and because it is defined over the thin region of the state space implicitly characterised by B(x)=0. As a consequence, we have prioritised constraints <sup>η</sup><sup>0</sup> = (<sup>x</sup> <sup>∈</sup> <sup>X</sup><sup>0</sup> <sup>∧</sup> <sup>B</sup>(x) <sup>&</sup>gt; 0) and <sup>η</sup><sup>u</sup> = (<sup>x</sup> <sup>∈</sup> <sup>X</sup><sup>u</sup> <sup>∧</sup> <sup>B</sup>(x) <sup>≤</sup> 0): that is, if either clauses is satisfied, i.e. a counter-example is found for at least one of them, the verifier omits testing η<sup>d</sup> whilst the obtained counter-examples are passed to the learner. The constraint η<sup>d</sup> is thus checked solely if η<sup>0</sup> and η<sup>u</sup> are both deemed to be unsat. Whenever this occurs, and the verification of η<sup>d</sup> times out, the solver searches for a solution of a relaxed constraint (|B(x)<sup>|</sup> < τ<sup>v</sup> <sup>∧</sup>B˙(x) <sup>≥</sup> 0), similarly to the improved learning conditions discussed in Eq. (7). Whilst this constraint is arguably easier to solve in general, it may generate spurious counterexamples, namely a sample x¯ that satisfy the relaxed constraint, but such that <sup>B</sup>(x¯) = 0. The generation of these samples does not contradict the soundness of the procedure, and indeed improve the robustness of the next candidate BC – this of course comes with the cost of increasing the number of CEGIS iterations.

*Increased Information from Counter-examples* The verification task encompasses an SMT solver attempting to generate a counter-example, namely a (single) instance satisfying Eq. (8). However, a lone sample might not always provide insightful information for the learner to process. Naïvely asking the SMT solver to generate more than one counter-example can be in general expensive. Specifically, the verifier solves Eq. (8) to find a first counter-example x¯; then, to find any additional sample, we include the statement (<sup>x</sup> <sup>=</sup> <sup>x</sup>¯) and solve again for the resulting formula. We are interested in finding numerous points invalidating the BC conditions and feed them to the learner as a batch, or in increasing the information generated by the verifier by finding a sample that maximises the violation of the BC conditions. To this end, firstly we randomly generate a *cloud* of points around the generated counter-example: in view of the continuity of the candidate function B, samples around a counter-example are also likely to invalidate the BC conditions. Secondly, for the original counter-example, we compute the gradient of B (or of B˙) and follow the direction that maximises the

violation of the BC constraints. As such, we follow the B (resp. B˙) maximisation when considering <sup>x</sup> <sup>∈</sup> <sup>X</sup><sup>0</sup> (x s.t. <sup>|</sup>B(x)<sup>|</sup> < τv), and vice versa when <sup>x</sup> <sup>∈</sup> <sup>X</sup>u. This gradient computation is extremely fast as it exploits the neural architecture, and it provides more informative samples for further use by the learner.


### 4 Case Studies and Experimental Results

All experiments are performed on a laptop workstation with 8 GB RAM, running on Ubuntu 18.04. We demonstrate that the proposed method finds provably correct BCs on benchmarks from literature comprising both polynomial and non-polynomial dynamics: we compare our approach against the work [34], as this is the only work on sound synthesis of BCs with NNs to the best of our knowledge, and against the SOS optimisation software SOSTOOLS [18]. Beyond the benchmarks proposed in [34], we newly tackle a hybrid model as well as larger, (up to) 8-dimensional models, which push the boundaries of the verification engine and display a significant extension to the state of the art. To confirm the flexibility of our architecture, we employ SMT-solver dReal in the first four benchmarks, whereas we study the last four using Z3. In all the examples, we use a learning rate of 0.1 for the NN and the loss function in Section 3.1 with α = 10−4, τ<sup>0</sup> = τ<sup>u</sup> = τ<sup>d</sup> = 0.1. The region in Eq. (7) is limited by β<sup>1</sup> = 0.1, whilst <sup>β</sup><sup>2</sup> <sup>=</sup> <sup>∞</sup>. Accordingly, the training over a large set <sup>B</sup> results in a candidate <sup>B</sup> with a negative derivative over this large region, which validity is more likely to be certified by the verifier. We set a verification parameter τ<sup>v</sup> = 0.05 (cf. Sec. 3.2), a timeout (later denoted as OOT) of 60 seconds and the precision for dReal to δ = 10−6. Table 1 summarises the outcomes. We emphasise that our approach supports any network depth and width. The presented results seek a tradeoff between speed (low order, small networks) and expressiveness (high order, larger networks): a different architecture may result in a slower or faster synthesis.

For the first four benchmarks, we compare our procedure, denoted as CEGIS, with the repeated results from [34], which however does not handle the hybrid model in the fifth benchmark. We have run the algorithm in [34] and reported the cumulative synthesis time under the 'Learn' column. However the verification is not included in the repeatability package, hence we report the results from [34], which are generated with much more powerful hardware. Due to this issue of lack of repeatability, we have not run [34] on the larger models. Compared to [34], the outcomes suggest that we obtain *much faster* synthesis and verification times, whilst requiring up to only 0.1% (see Obstacle Avoidance Problem) of the training data: [34] performs a uniform sampling of the space X, hence suffers especially in the 3-D case, where the learning runs *two orders of magnitude* faster. Evidently this gap in performance derives from the different synthesis procedure: it appears to be more advantageous to employ a smaller, randomly sampled initial dataset that is progressively augmented with counter-examples, rather than to uniformly sample the state space to then train the neural network.

Next, we have implemented the SOS optimisation problems in [10] within the software SOSTOOLS [18] to generate barrier candidates, which are polynomials up to order 4 (this is the maximum order of the polynomial candidates generated by our Learner). In a few instances we ought to conservatively approximate the expression of X<sup>0</sup> or X<sup>u</sup> in order to encode them as SOS program - this makes their applicability less general. SOSTOOLS has successfully found BC candidates for five of the eight benchmarks, and they were generated consistently fast, in view of the convex structure of the underlying optimisation problem. However, recall that these techniques lack soundness (also due to numerical errors), which is instead a core asset of our approach. Consequently, we have passed them to the Z3 SMT solver, which should easily handle polynomial formulae: only one of them ('Hybrid Model') has been successfully verified; instead, the candidate for the 'Polynomial Model' has been invalidated (namely Z3 has found a counterexample for it), whereas the verification of the remaining BC candidates has run out of time. For the latter instances, we have experienced that SOSTOOLS generally returns numerically ill-conditioned expressions, namely candidates with coefficients of rather different magnitude, with many decimal digits: even after rounding, expressions with this structure are known to be hardly handled by SMT solvers [2,5], which results in long time needed to return an answer - this explains the experienced timeouts. These experiments suggest that the use of SOS programs within a CEGIS loop appears hardly attainable.

Notice that all the case studies are solved with a *small number* of iterations (up to 9) of the CEGIS loop: this feature, along with the limited runtimes, is promising towards tackling synthesis problems over larger models.

For the eight case studies, we report below the full expressions of the dynamics of the models, the spatial domain X (as a set of constrains), the set of initial conditions <sup>X</sup><sup>0</sup> <sup>⊂</sup> <sup>X</sup>, and the unsafe set <sup>X</sup><sup>u</sup> <sup>⊂</sup> <sup>X</sup>. We add a detailed analysis of the CEGIS iterations involved in the synthesis of the corresponding BCs.


Table 1. Outcomes of the case studies: Cumulative time for Learning and Verification steps are given in seconds; 'Samples' indicates the size of input data for the Learner (in thousands); 'Iters' is the number of iterations of the CEGIS loop (which is specific to our work); × indicates a synthesis or verification failure; OOT denotes a verification timeout. The Hybrid and the three ODE Models are newly introduced in this work.

*Darboux Model* This 2-dimensional model is approached using polynomial BCs. Its analytical expression is

$$\begin{cases} \dot{x} = y + 2xy, & X = \{-2 \le x, y \le 2\}, \\ \dot{y} = -x + 2x^2 - y^2, & \text{with domains} \end{cases} \quad \text{with domains} \quad \begin{aligned} X &= \{-2 \le x, y \le 2\}, \\ X\_0 &= \{0 \le x \le 1, 1 \le y \le 2\}, \\ X\_u &= \{x + y^2 \le 0\}. \end{aligned}$$

The work [33] reports that methods based on linear matrix inequalities fail to verify this model using polynomial templates of degree 6. Our approach generates the BC shown in Fig. 2 (left) in approximately 30 seconds, roughly half as much as in [34], and using only 500 initial samples vs more than 65000. The initial and unsafe sets are depicted in green and red, respectively, whereas the level set B(x)=0 is outlined in black. The BC is derived from a single-layer architecture of 10 nodes, with linear activations.

*Exponential Model* This model from [17] shows that our approach extends to non-polynomial systems encompassing exponential and trigonometric functions:

$$\begin{cases} \dot{x} = e^{-x} + y - 1, & X = \{-2 \le x, y \le 2\}, \\ \dot{y} = -\sin^{2}x, & X\_{0} = \{(x + 0.5)^{2} + (y - 0.5)^{2} \le 0.16\}, \\ & X\_{u} = \{(x - 0.7)^{2} + (y + 0.7)^{2} \le 0.09\}. \end{cases}$$

Our algorithm provides a valid BC in 16 seconds, around 7% of the results in [34], again using solely 1500 initial samples. The BC, depicted in Fig.2 (centre), results from a single-layer neural architecture of 10 nodes, with polynomial (γ = 3) activation function.

*Obstacle Avoidance Problem* This 3-dimensional model, originally presented in [6], describes a robotic application: the control of the angular velocity of a

Fig. 2. The BC for the Darboux (top left), Exponential (middle left), and Obstacle Avoidance (the 3-D study, bottom left) models with corresponding vector fields (right column). Initial and unsafe sets are represented in green and red, respectively; the black line outlines the level curve B(x)=0.

two-dimensional airplane, aimed at avoiding a still obstacle. The details are

$$\begin{cases} \dot{x} = v \sin \varphi, \\ \dot{y} = v \cos \varphi, \\ \dot{\varphi} = u, \quad \text{where} \quad u = -\sin \varphi + 3 \cdot \frac{x \sin \varphi + y \cos \varphi}{0.5 + x^2 + y^2}, \text{ with domains} \end{cases}$$

$$\begin{aligned} X &= \{-2 \le x, y \le 2, -\pi/2 < \varphi < \pi/2\}, \\ X\_0 &= \{-0.1 \le x \le 0.1, -2 \le y \le -1.8, -\pi/6 < \varphi < \pi/6\}, \\ X\_u &= \{x^2 + y^2 \le 0.04\}. \end{aligned}$$

The BC is obtained from a single-layer NN comprising 10 neurons, using (γ = 3) polynomial activations. Fig. 2 (right) plots the vector field on the plane z = 0. Our procedure takes 1% of the computational time in [34], providing a valid BC with 9 iteration starting from an initial dataset of 2000 samples.

*Polynomial Model* This model describes a polynomial system [22] and presents initial and unsafe sets with complex, non convex shapes [34], as follows:

$$\begin{cases} \dot{x} = y, \\ \dot{y} = -x + 1/3x^3 - y, \text{ with domains} \end{cases}$$

$$\begin{aligned} X = \{-3.5 \le x \le 2, -2 \le y \le 1\}, \\ X\_0 = \{(x - 1.5)^2 + y^2 \le 0.25 \vee (x \ge -1.8 \wedge x \le -1.2 \wedge y \ge -0.1 \wedge y \le 0.1) \}, \\ \lor \ (x \ge -1.4 \wedge x \le -1.2 \wedge y \ge -0.5 \wedge y \le 0.1) \}, \\ X\_u = \{(x + 1)^2 + (y + 1)^2 \le 0.16 \vee (x \ge 0.4 \wedge x \le 0.6 \wedge y \ge 0.1 \wedge y \le 0.5) \} \end{aligned}$$

SOS-based procedures [16,29], have required high-order polynomial templates, which has suggested the use of alternative activation functions. The BC, shown in Fig. 3, is generated using a 10-neuron, two-layer NN with polynomial (γ = 3) and tanh activations. Needing just around 1 min and only 2300 initial samples, the overall procedure is 30 times faster than that in [34].

*Hybrid Model* We challenge our procedure with a 2-dimensional hybrid model, which extends beyond the capability of the results in [34]. This hybrid framework partitions the set X into two non-overlapping subsets, X<sup>1</sup> and X2. Each subset is associated to different model dynamics, respectively f<sup>1</sup> and f2. In other words, the model trajectories evolve according to the f<sup>1</sup> dynamics when in X1, and according to f<sup>2</sup> when in X2.

$$f\_1 = \begin{cases} \dot{x} = y, \\ \dot{y} = -x - 0.5x^3, \end{cases} \qquad f\_2 = \begin{cases} \dot{x} = y, \\ \dot{y} = x - 0.25y^2, \end{cases}$$

with domain for <sup>f</sup><sup>1</sup> <sup>=</sup> {(x, y) : x < <sup>0</sup>}, domain for <sup>f</sup><sup>2</sup> <sup>=</sup> {(x, y) : <sup>x</sup> <sup>≥</sup> <sup>0</sup>}, and sets

$$\begin{aligned} X &= \{x^2 + y^2 \le 4\}, & X\_0 &= \{(x+1)^2 + (y+1)^2 \le 0.25\}, \\ X\_u &= \{(x-1)^2 + (y-1)^2 \le 0.25\}. \end{aligned}$$

The structure of this model represents a non-trivial task for the verification engine, for which we employ the Z3 SMT solver. The learning phase has instead been quite fast. The BC (Fig.3) is obtained from a single-layer NN comprising 3 neurons, using polynomial activations with γ = 2, overall in less than 3 seconds, starting with an initial dataset of 500 samples.

Fig. 3. The BC for the polynomial model (top left) and the hybrid model (top right) with the respective vector field (below).

*Larger-dimensional Models* We finally challenge our procedure with three high-order ODEs, respectively of order four, six and eight, to display the general applicability of our counter-example guided BC synthesis. We consider dynamical models described by the following differential equations:

$$x^{(4)} + 3980x^{(3)} + 4180x^{(2)} + 2400x^{(1)} + 576 = 0,\tag{9}$$

$$x^{(6)} + 800x^{(5)} + 2273x^{(4)} + 3980x^{(3)} + 4180x^{(2)} + 2400x^{(1)} + 576 = 0,\quad (10)$$

$$x^{(8)} + 20x^{(7)} + 170x^{(6)} + 800x^{(5)} + 2273x^{(4)}$$

$$+ 3980x^{(3)} + 4180x^{(2)} + 2400x^{(1)} + 576 = 0,\tag{11}$$

where we denote the i-th derivative of variable x by x(i). We translate the ODE into a state-space model with variables <sup>x</sup>1,...,x<sup>j</sup> , where <sup>j</sup> <sup>=</sup> {4, <sup>6</sup>, <sup>8</sup>}, respectively. In all three instances, we select as spatial domain X an hyper-sphere centred at the origin of radius 4; an initial set X<sup>0</sup> as hyper-sphere<sup>4</sup> centred at +**1**[j] of radius 0.25; an unsafe set <sup>X</sup><sup>u</sup> as an hyper-sphere centred at <sup>−</sup>**2**[j] of radius 0.16. For the synthesis, we employ for all case studies a single-layer, 5-node architecture with polynomial (γ = 1) activation function. Whilst in particular the verification engine is challenged from the high dimensionality of the models, the CEGIS procedure returns a valid barrier certificate in up to 3 iterations and with very reasonable run times.

<sup>4</sup> We denote **1**[j] the point of a j-dimensional state-space that has all its components equal to 1. For instance, **1**[3] is the 3-dimensional point [1, 1, 1]. Similarly for **2**[j] .

*Codebase Robustness* The results in Table 1 are obtained setting the NN initialisation seed manually for repeatability. We now test the robustness of the overall algorithm by randomising the initialisation seed. We report in Table 2 the percentage of successful runs, the average time and iterations count, along with minimum and maximum values, over 50 runs. We set timeouts as a max running time of 10 minutes, or as 12 CEGIS loops. Notice that small architectures are highly susceptible to initialisations, which renders this test rather challenging. Compared to Table 1, we notice similar performances for the Darboux, Exponential and Hybrid models, vouching for the robustness of our approach. However, the performance decreases when tackling the most challenging models. Still, we highlight that the procedure can synthesise a valid BC very rapidly for every benchmark (notice the lower bounds of the computational times). This outcome suggests that a parallel approach - i.e. the procedure running on several networks simultaneously - may be suited to quickly synthesise candidates. Overall, the table shows a high degree of variance, possibly indicating the need for larger architectures to enhance robustness.


Table 2. Percentage of successful runs, average number of iterations and average computational times (in seconds) of the CEGIS procedure, over 50 runs. The square brackets contain the minimum and maximum values obtained.

### 5 Conclusions and Future Work

We have presented a new inductive, formal, automated technique to synthesise neural-based barrier certificates for polynomial and non-polynomial, continuous and hybrid dynamical models. Thanks to a number of architectural choices for the new procedure, our method requires less training data and thus displays faster learning, as well as quicker verification time, than state-of-the-art techniques.

Ongoing work is porting presented and related [5,2] theoretical results into a software tool [1]. Towards increased automation, future work includes the development of an automated selection of activation functions that are tailored to the dynamical models of interest.

### References


388 A. Peruffo et al.

34. Hengjun Zhao, Xia Zeng, Taolue Chen, and Zhiming Liu. Synthesizing Barrier Certificates Using Neural Networks. In *Proceedings of the 23rd International Conference on Hybrid Systems: Computation and Control*, HSCC '20, New York, NY, USA, 2020. Association for Computing Machinery.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Improving Neural Network Verification through Spurious Region Guided Refinement

Pengfei Yang1,<sup>2</sup> , Renjue Li1,<sup>2</sup> , Jianlin Li1,<sup>2</sup> , Cheng-Chao Huang3,<sup>4</sup> , Jingyi Wang<sup>5</sup> , Jun Sun<sup>6</sup> , Bai Xue1,<sup>2</sup> , and Lijun Zhang1,2,<sup>3</sup> (-)

<sup>1</sup> SKLCS, Institute of Software, Chinese Academy of Sciences, Beijing, China <sup>2</sup> University of Chinese Academy of Sciences, Beijing, China <sup>3</sup> Institute of Intelligent Software, Guangzhou, China

<sup>4</sup> CAS Software Testing (Guangzhou) Co., Ltd., Guangzhou, China

<sup>5</sup> Zhejiang University NGICS Platform, Hangzhou, China

<sup>6</sup> Singapore Management University, Singapore, Singapore zhanglj@ios.ac.cn

Abstract. We propose a spurious region guided refinement approach for robustness verification of deep neural networks. Our method starts with applying the DeepPoly abstract domain to analyze the network. If the robustness property cannot be verified, the result is inconclusive. Due to the over-approximation, the computed region in the abstraction may be *spurious* in the sense that it does not contain any true counterexample. Our goal is to identify such spurious regions and use them to guide the abstraction refinement. The core idea is to make use of the obtained constraints of the abstraction to infer new bounds for the neurons. This is achieved by linear programming techniques. With the new bounds, we iteratively apply DeepPoly, aiming to eliminate spurious regions. We have implemented our approach in a prototypical tool DeepSRGR. Experimental results show that a large amount of regions can be identified as spurious, and as a result, the precision of DeepPoly can be significantly improved. As a side contribution, we show that our approach can be applied to verify quantitative robustness properties.

### 1 Introduction

In the seminal work [34], deep neural networks (DNN) have been successfully applied in Go to play against expert humans. Afterwards, they have achieved exceptional performance in many other applications such as image, speech and audio recognition, selfdriving cars, and malware detection. Despite the success of solving these problems, DNNs have also been shown to be often lack of robustness, and are vulnerable to adversarial samples [39]. Even for a well-trained DNN, a small (and even imperceptible) perturbation may fool the network. This is arguably one of the major obstacles when we deploy DNNs in safety-critical applications like self-driving cars [42], and medical systems [33].

It is thus important to guarantee the robustness of DNNs for safety-critical applications. In this work, we focus on (local) robustness, i.e., given an input and a manipulation region around the input (which is usually specified according to a certain norm), we verify that a given DNN never makes any mistake on any input in the region. The first work on DNN verification was published in [30], which focuses on DNNs with sigmoid activation functions with a partition-refinement approach. In 2017, Katz et al. [20] and Ehlers [10] independently implemented Reluplex and Planet, two SMT solvers to verify DNNs with the ReLU activation function on properties expressible with SMT constraints. Since 2018, abstract interpretation has been one of the most popular methods for DNN verification in the lead of AI<sup>2</sup> [13], and subsequent works like [36,37,23,1,35,28,24] have improved AI<sup>2</sup> in terms of efficiency, precision and more activation functions (like sigmoid and tanh) so that abstract interpretation based approach can be applied to DNNs of larger size and more complex structures.

Among the above methods, DeepPoly [37] is a most outstanding one regarding precision and scalability. DeepPoly is an abstract domain specially developed for DNN verification. It sufficiently considers the structures and the operators of a DNN, and it designs a polytope expression which not only fits for these structures and operators to control the loss of precision, but also works with a very small time overhead to achieve scalability. However, as an abstraction interpretation based method, it provides very little insight if it fails to verify the property. In this work, we propose a method to improve DeepPoly by eliminating spurious regions through abstraction refinement. A spurious region is a region computed using abstract semantics, conjuncted with the negation of the property to be verified. This region is spurious in the sense that if the property is satisfied, then this region, although not empty, does not contain any true counterexample which can be realized in the original program. In this case, we propose a refinement strategy to rule out the spurious region, i.e., to prove that this region does not contain any true counterexamples.

Our approach is based on DeepPoly and improves it by refinement of the spurious region through linear programming. The core idea is to intersect the abstraction constructed by abstract interpretation with the negation of the property to generate a spurious region, and perform linear programming on the constraints of the spurious region so that the bounds of the ReLU neurons whose behaviors are uncertain can be tightened. As a result, some of these neurons can be determined to be definitely activated or deactivated, which significantly improves the precision of the abstraction given by abstract interpretation. This procedure can be performed iteratively and the precision of the abstraction are gradually improved, so that we are likely to rule out this spurious region in some iteration. If we successfully rule out all the possible spurious regions through such an iterative refinement, the property is soundly verified. Our method is similar in spirit to counterexample guided abstraction refinement (CEGAR) [6], i.e., we apply abstract interpretation for abstraction and linear programming for refinement. A fundamental difference is that we use the constraints of the spurious region, instead of a concrete counterexample (which is challenging to construct in our setting), as the guidance of refinement.

The same spurious region guided refinement approach is also effective in quantitative robustness verification. Instead of requiring that all inputs in the region should be correctly classified, a certain probability of error in the region is allowed. Quantitative robustness is more realistic and general compared to the ordinary robustness, and a DNN verified against quantitative robustness is useful in practice as well. The spurious region guided refinement approach naturally fits for this setting, since a comparatively precise over-approximation of the spurious region implies a sound robustness confidence. To the best of our knowledge, for DNNs, this is the first work to verify quantitative robustness with strict soundness guarantee, which distinguishes our approach from the previous sampling based methods like [45,46,3].

In summary, our main contributions are as follows:


*Organisations of the paper.* We provide preliminaries in Section 2. DeepPoly is recalled in Section 3. We present our overall verification framework and the algorithm in Section 4, and discuss quantitative robustness verification in Section 5. Section 6 evaluates our algorithms through experiments. Section 7 reviews related works and concludes the paper.

### 2 Preliminaries

In this section we recall some basic notions on deep neural networks, local robustness verification, and abstract interpretation. Given a vector <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup>, we write <sup>x</sup><sup>i</sup> to denote its <sup>i</sup>-th entry for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup>.

### 2.1 Robustness verification of deep neural networks

In this work, we focus on deep feedforward neural networks (DNNs), which can be represented as a function <sup>f</sup> : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup><sup>n</sup>, mapping an input <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup> to its output <sup>y</sup> <sup>=</sup> <sup>f</sup>(x) <sup>∈</sup> <sup>R</sup><sup>n</sup>. A DNN <sup>f</sup> often classifies an input <sup>x</sup> by obtaining the maximum dimension of the output, i.e., arg max<sup>1</sup>≤i≤<sup>n</sup> <sup>f</sup>(x)i. We denote such a DNN by <sup>C</sup><sup>f</sup> : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>C</sup> which is defined by <sup>C</sup><sup>f</sup> (x) = arg max<sup>1</sup>≤i≤<sup>n</sup> <sup>f</sup>(x)<sup>i</sup> where <sup>C</sup> <sup>=</sup> {1,...,n} is the set of classification classes.

A DNN has a sequence of layers, including an input layer at the beginning, followed by several hidden layers, and an output layer in the end. The output of a layer is the input of the next layer. Each layer contains multiple neurons, the number of which is known as the dimension of the layer. The DNN f is the composition of the transformations between layers. Typically an affine transformation followed by a non-linear activation function is performed. For an affine transformation y = Ax + b, if the matrix A is not sparse, we call such a layer fully connected. A DNN with only fully connected layers and activation functions is a fully connected neural network (FNN). In this work, we focus on the rectified linear unit (ReLU) activation function, defined as ReLU(x) = max(x, 0) for <sup>x</sup> <sup>∈</sup> <sup>R</sup>. Typically, a DNN verification problem is defined as follows:

Definition 1. *Given a DNN* <sup>f</sup> : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup><sup>n</sup>*, a set of inputs* <sup>X</sup> <sup>⊆</sup> <sup>R</sup><sup>m</sup>*, and a property* <sup>P</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup>*, we need to determine whether* <sup>f</sup>(X) := {f(x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>X</sup>} ⊆ <sup>P</sup> *holds.*

Local robustness describes the stability of the behaviour of a normal input under a perturbation. The range of input under this perturbation is the robustness region. For a DNN C<sup>f</sup> (x) which performs classification tasks, a robustness property typically states that C<sup>f</sup> outputs the same class on the robustness region.

There are various ways to define a robustness region, and one of the most popular ways is to use the <sup>L</sup><sup>p</sup> norm. For <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup> and <sup>1</sup> <sup>≤</sup> p < <sup>∞</sup>, we define the <sup>L</sup><sup>p</sup> norm of <sup>x</sup> to be <sup>2</sup>x2<sup>p</sup> = (<sup>m</sup> <sup>i</sup>=1 <sup>|</sup>xi<sup>|</sup> p) 1 <sup>p</sup> , and its <sup>L</sup><sup>∞</sup> norm <sup>2</sup>x2<sup>∞</sup> = max1≤i≤<sup>m</sup> <sup>|</sup>xi|. We write <sup>B</sup>¯p(x, r) := {x <sup>∈</sup> <sup>R</sup><sup>m</sup> | 2x−x <sup>2</sup><sup>p</sup> <sup>≤</sup> <sup>r</sup>} to represent a (closed) <sup>L</sup><sup>p</sup> ball for <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup> and r > 0, which is a neighbourhood of x as its robustness region. If we set X = B¯p(x, r) and <sup>P</sup> <sup>=</sup> {<sup>y</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> arg max<sup>i</sup> <sup>y</sup><sup>i</sup> <sup>=</sup> <sup>C</sup><sup>f</sup> (x)} in Def. 1, it is exactly the robustness verification problem. Hereafter, we set <sup>p</sup> <sup>=</sup> <sup>∞</sup>.

### 2.2 Abstract interpretation for DNN verification

Abstract interpretation [7] is a static analysis method and it is aimed to find an overapproximation of the semantics of programs and other complex systems so as to verify their correctness. Generally we have a function <sup>f</sup> : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> representing the concrete program, a set <sup>X</sup> <sup>⊆</sup> <sup>R</sup><sup>m</sup> representing the property that the input of the program satisfies, and a set <sup>P</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> representing the property to verify. The problem is to determine whether <sup>f</sup>(X) <sup>⊆</sup> <sup>P</sup> holds. However, in many cases it is difficult to calculate <sup>f</sup>(X) and to determine whether <sup>f</sup>(X) <sup>⊆</sup> <sup>P</sup> holds. Abstract interpretation uses abstract domains and abstract transformations to over-approximate sets and functions so that an overapproximation of the output can be obtained efficiently.

Now we have a concrete domain <sup>C</sup>, which includes <sup>X</sup> as one of its elements. To make computation efficient, we need an abstract domain A to abstract elements in the concrete domain. We assume that there is a partial order ≤ on C and A, which in our settings is the subset relation <sup>⊆</sup>. We also have a concretization function <sup>γ</sup> : A→C which assigns an abstract element to its concrete semantics, and γ(a) is the least upper bounds of the concrete elements that can be soundly abstracted by <sup>a</sup> ∈ A. Naturally <sup>a</sup> ∈ A is a sound abstraction of <sup>c</sup> ∈ C if and only if <sup>c</sup> <sup>≤</sup> <sup>γ</sup>(a).

The design of an abstract domain is one of the most important problems in abstract interpretation because it determines the efficiency and precision. In practice, we use a certain type of constraints to represent the abstract elements in an abstract domain. Classical abstract domains for Euclidean spaces include Box, Zonotope [14,15], and Polyhedra [38].

Not only do we need abstract domains to over-approximate sets, but we are also required to adopt over-approximation to functions. Here we consider the lifting of the function <sup>f</sup> : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> defined as <sup>T</sup><sup>f</sup> (X) : <sup>P</sup>(R<sup>m</sup>) → P(R<sup>n</sup>), <sup>T</sup><sup>f</sup> (X) := <sup>f</sup>(X) = {f(x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>X</sup>}. Now we have an abstract domain <sup>A</sup><sup>k</sup> for the <sup>k</sup>-dimension Euclidean space and the corresponding concretization γ. A function T # <sup>f</sup> : A<sup>m</sup> → A<sup>n</sup> is a sound abstract transformer of <sup>T</sup><sup>f</sup> , if <sup>T</sup><sup>f</sup> ◦ <sup>γ</sup> <sup>⊆</sup> <sup>γ</sup> ◦ <sup>T</sup> # f .

When we have a sound abstraction <sup>X</sup># ∈ A of <sup>X</sup> and a sound abstract transformer T # <sup>f</sup> , we can use the concretization of <sup>T</sup> # <sup>f</sup> (X#) to over-approximate <sup>f</sup>(X) since we have <sup>f</sup>(X) = <sup>T</sup><sup>f</sup> (X) <sup>⊆</sup> <sup>T</sup><sup>f</sup> (γ(X#)) <sup>⊆</sup> <sup>γ</sup> ◦ <sup>T</sup> # <sup>f</sup> (X#). If <sup>γ</sup> ◦ <sup>T</sup> # <sup>f</sup> (X#) <sup>⊆</sup> <sup>P</sup>, the property P is successfully verified. Obviously, verification through abstract interpretation is sound but not complete. Hereafter, we write f# to represent T # <sup>f</sup> for simplicity.

AI<sup>2</sup> [13] first adopted abstract interpretation to verify DNNs, and many subsequent works like [36,37,23] focused on improving its efficiency and precision through, e.g., defining new abstract domains. As a deep neural network, the function <sup>f</sup> : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> can be regarded as a composition <sup>f</sup> <sup>=</sup> <sup>f</sup><sup>l</sup> ◦···◦ <sup>f</sup><sup>1</sup> of its <sup>l</sup> + 1 layers, where <sup>f</sup><sup>j</sup> performs the transformation between the j-th and the (j + 1)-th layer, i.e., it can be an affine transformation, or a ReLU operation. If we choose Box, Zonotope, or Polyhedra as the abstract domain, then for linear transformations and the ReLU functions, their abstract transformers have been developed in [13]. After we have abstract transformers f# <sup>j</sup> for these f<sup>j</sup> , we can conduct abstract interpretation layer by layer as f# <sup>l</sup> ◦···◦ <sup>f</sup># <sup>1</sup> (X#).

### 3 A Brief Introduction to DeepPoly

Our approach relies on the abstract domain DeepPoly [37], which is the state-of-the-art abstract domain for DNN verification. It defines the abstract transformers of multiple activation functions and layers used in DNNs. The core idea of DeepPoly is to give every variable an upper and a lower bound in the form of an affine expression using only variables that appear before it. It can express a polyhedron globally. Moreover, experimentally, it often has better precision than Box and Zonotope domains.

We denote the <sup>n</sup>-dimensional DeepPoly abstract domain with <sup>A</sup>n. Formally an abstract element <sup>a</sup> ∈ A<sup>n</sup> is a tuple (a≤, a≥,l, u), where <sup>a</sup><sup>≤</sup> and <sup>a</sup><sup>≥</sup> give the <sup>i</sup>-th variable x<sup>i</sup> a lower bound and an upper bound, respectively, in the form of a linear combination of variables which appear before it, i.e. <sup>i</sup>−<sup>1</sup> <sup>k</sup>=1 <sup>w</sup>kx<sup>k</sup> <sup>+</sup> <sup>w</sup>0, for <sup>i</sup> = 1,...,n, and l, u <sup>∈</sup> <sup>R</sup><sup>n</sup> give the lower bound and upper bound of each variable, respectively. The concretization of a is defined as

$$\gamma(a) = \{x \in \mathbb{R}^n \mid a\_i^{\leq} \leq x\_i \leq a\_i^{\geq}, \ i = 1, \ldots, n\}. \tag{1}$$

The abstract domain <sup>A</sup><sup>n</sup> also requests that its abstract elements <sup>a</sup> should satisfy the invariant <sup>γ</sup>(a) <sup>⊆</sup> [l, u]. This invariant helps construct efficient abstract transformers.

For an affine transformation x<sup>i</sup> = <sup>i</sup>−<sup>1</sup> <sup>k</sup>=1 <sup>w</sup>kx<sup>k</sup> <sup>+</sup> <sup>w</sup>0, we set

$$a\_i^{\leq} = a\_i^{\geq} = \sum\_{k=1}^{i-1} w\_k x\_k + w\_0.$$

By substituting the variables x<sup>j</sup> appearing in a<sup>≤</sup> <sup>i</sup> with <sup>a</sup><sup>≤</sup> <sup>j</sup> or <sup>a</sup><sup>≥</sup> <sup>j</sup> according to its coefficient at most <sup>i</sup> <sup>−</sup> <sup>1</sup> times, we can obtain a sound lower bound in the form of linear

Fig. 1. Framework of spurious region guided refinement

combination on input variables only, and l<sup>i</sup> can be computed immediately from the range of input variables. A similar procedure also works for computing ui.

For a ReLU transformation x<sup>i</sup> = ReLU(x<sup>j</sup> ), we consider two cases:


Note that for a DNN with only ReLU as non-linear operators, over-approximation occurs only when there are uncertain ReLU neurons, which are over-approximated using a triangle. The key of improving the precision is thus to compute the bounds of the uncertain ReLU neurons as precisely as possible, and to determine the behaviors of the most uncertain ReLU neurons.

DeepPoly also supports activation functions which are monotonically increasing, convex on (−∞, 0] and concave on [0, <sup>+</sup>∞), like sigmoid and tanh, and it supports max pooling layers. Readers can refer to [37] for details.

### 4 Spurious Region Guided Refinement

We explain the main steps of our algorithm, as depicted in Fig. 1. For the input property and network, we first employ DeepPoly as the initial step to compute f#(X#). The concretization of f#(X#) is the conjunction of many linear inequities given in Eq. 1, and for the robustness property <sup>P</sup>, the negation <sup>¬</sup><sup>P</sup> is the disjunction of several linear inequities <sup>¬</sup><sup>P</sup> <sup>=</sup> <sup>t</sup>=C<sup>f</sup> (x)(y<sup>C</sup><sup>f</sup> (x) <sup>−</sup> <sup>y</sup><sup>t</sup> <sup>≤</sup> 0).


Below we give an example, illustrating how refinement can help in robustness verification.

*Example 1.* Consider the network <sup>f</sup>(x) = ReLU <sup>1</sup> <sup>−</sup><sup>1</sup> 1 1 x + 0 2.5 and the region <sup>B</sup>¯∞((0, 0)T, 1). The robustness property <sup>P</sup> here is <sup>y</sup><sup>2</sup> <sup>−</sup> <sup>y</sup><sup>1</sup> <sup>&</sup>gt; <sup>0</sup>. We invoke first DeepPoly: the lower bound of <sup>y</sup><sup>2</sup> <sup>−</sup> <sup>y</sup><sup>1</sup> given by DeepPoly is <sup>−</sup>0.5. As a result, the robustness property cannot be verified directly. Fig. 2(a) shows details of the example.

We fail to verify the property in Example 1 because for the uncertain ReLU relation y<sup>1</sup> = ReLU(x3), the abstraction is imprecise, and the key to making the abstraction more precise here is to obtain as tight a bound as possible for x3.

*Example 2.* We use the constraints in Fig. 2(a) and additionally the constraint <sup>y</sup>2−y<sup>1</sup> <sup>≤</sup> <sup>0</sup> (i.e., <sup>¬</sup>P) as the input of linear programming. Our aim is to obtain a tighter bound of the input neurons x<sup>1</sup> and x2, as well as the uncertain ReLU neuron x3, so the objective functions of the linear programming are min <sup>x</sup><sup>i</sup> and min <sup>−</sup>x<sup>i</sup> for <sup>i</sup> = 1, <sup>2</sup>, <sup>3</sup>. All the three neurons have a tighter bound after the linear programming (see the red part in Fig. 2(b)). Fig. 2(b) shows the running of DeepPoly under these new bounds, where the input range and the abstraction of the uncertain ReLU neuron are both refined. Now the lower bound of <sup>y</sup><sup>2</sup> <sup>−</sup> <sup>y</sup><sup>1</sup> is <sup>0</sup>.25, so DeepPoly successfully verifies the property.

Fig. 2. Example 1 (left) and Example 2 (right): where the red parts are introduced through linear programming based refinement and the blue parts are introduced by a second run of DeepPoly.

#### 4.1 Main algorithm

Alg. 1 presents our algorithm. First we run abstract interpretation to find the uncertain neurons and the spurious regions (Line 2–5). For each possible spurious region, we have a while loop which iteratively refines the abstraction. In each iteration we perform linear programming to renew the bounds of the input neurons and uncertain ReLU neurons; when we find that the bound of an uncertain ReLU neuron becomes definitely nonnegative or non-positive, then the ReLU behavior of this neuron is renewed (Line 14– 20). We use them to guide abstract interpretation in the next step (Line 21–22). Here in Line 22, we make sure that during the abstract interpretation, the abstraction of previous uncertain neurons (namely the uncertain neurons before the linear programming step in the same iteration) compulsorily follows the new bounds and new ReLU behaviors given by the current <sup>C</sup>≥0, <sup>C</sup>≤0, <sup>l</sup>, and <sup>u</sup>, where these bounds will not be renewed by abstract interpretation, and the concretization of Y is defined as

$$\gamma(Y) = \{x \mid \forall i. \ Y\_i^{\leq} \leq x\_i \leq Y\_i^{\geq}\} \cap [l, u]. \tag{2}$$

The while loop ends when (i) either we find that the spurious region is infeasible (Line 11, 24) and we proceed to refine the next spurious region, with a label Verified True, (ii) or we reach the terminating condition and fail to rule out this spurious region, in which case we return UNKNOWN. If every while loop ends with the label Verified True, we successfully rule out all the spurious regions and return YES. An observation is that, if some spurious regions have been ruled out, we can add the constraints of their negation to make the current spurious region smaller so as to improve the precision (Line 9).

Here we discuss the soundness of Alg. 1. We focus on the while loop and claim that it has the following loop invariant:

Invariant 1 *The abstract element* Y *over-approximates the intersection of the semantics of* <sup>f</sup> *on* <sup>B</sup>¯∞(x, r) *and the spurious region, i.e.,* <sup>f</sup>(B¯∞(x, r)) <sup>∩</sup> Spu <sup>⊆</sup> <sup>γ</sup>(<sup>Y</sup> )*.*

Algorithm 1 Spurious region guided robustness verification


The initialization of <sup>Y</sup> is <sup>f</sup>#(B¯∞(x, r)) and it is naturally an over-approximation. The box <sup>X</sup> is obtained by linear programming on <sup>Y</sup> <sup>∧</sup> Spu, and <sup>f</sup>#(X) is calculated through abstract interpretation and the bounds given by linear programming on <sup>Y</sup> <sup>∧</sup> Spu, and thus it remains an over-approximation. It is worth mentioning that, when we run DeepPoly in Line 22, we are using the bounds obtained by linear programming to guide DeepPoly, and this may violate the invariant <sup>γ</sup>(a) <sup>⊆</sup> [l, u] mentioned in Sect. 3. Nonotheless, soundness still holds since the concretization of Y is newly defined in Eq. 2, where both items in the intersection over-approximate <sup>f</sup>(B¯∞(x, r)) <sup>∩</sup> Spu. With Invarient 1, Alg. 1 returns YES if for any possible spurious region Spu, the overapproximation of <sup>f</sup>(B¯∞(x, r)) <sup>∩</sup> Spu is infeasible, which implies the soundness of Alg. 1.

### 4.2 Iterative refinement of the spurious region

Here we present more theoretical insight on the iterative refinement of the spurious region. An iteration of the while loop in Alg. 1 can be represented as a function L : A → A, where A is the DeepPoly domain. An interesting observation is that, the abstract transformer f# in the DeepPoly domain is not necessarily increasing, because different input ranges, even if they have inclusion relation, may lead to different choices of the abstraction mode of some uncertain ReLU neurons, which may violate the inclusion relation of abstraction. We have found such examples during our experiment, which is illustrated in the following example.

*Example 3.* Let <sup>f</sup>(x) = ReLU(x) with input ranges <sup>I</sup><sup>1</sup> = [−2, 1] and <sup>I</sup><sup>2</sup> = [−2, 3]. We have <sup>f</sup>#(I1) = {(x1, x2)<sup>T</sup> <sup>∈</sup> <sup>R</sup><sup>2</sup> | −<sup>2</sup> <sup>≤</sup> <sup>x</sup><sup>1</sup> <sup>≤</sup> <sup>1</sup>, x<sup>2</sup> <sup>≥</sup> <sup>0</sup>, x<sup>2</sup> <sup>≤</sup> <sup>1</sup> <sup>3</sup>x<sup>1</sup> <sup>+</sup> <sup>2</sup> <sup>3</sup> } and <sup>f</sup>#(I2) = {(x1, x2)<sup>T</sup> <sup>∈</sup> <sup>R</sup><sup>2</sup> | −<sup>2</sup> <sup>≤</sup> <sup>x</sup><sup>1</sup> <sup>≤</sup> <sup>3</sup>, x<sup>2</sup> <sup>≥</sup> <sup>x</sup>1, x<sup>2</sup> <sup>≤</sup> <sup>3</sup> <sup>5</sup>x<sup>1</sup> <sup>+</sup> <sup>6</sup> <sup>5</sup> }. We observe (1, 0)<sup>T</sup> <sup>∈</sup> <sup>f</sup>#(I1) but (1, 0)<sup>T</sup> <sup>∈</sup>/ <sup>f</sup>#(I2), which implies that the transformer <sup>f</sup># is not increasing.

This fact also implies that L is not necessarily increasing, which violates the condition of Kleene's Theorem on fixed point [4].

Now we turn to the analysis of the sequence {Y<sup>k</sup> <sup>=</sup> <sup>L</sup><sup>k</sup>(f#(B¯∞(x, r)))}<sup>∞</sup> <sup>k</sup>=1, where <sup>L</sup><sup>1</sup> := <sup>L</sup> and <sup>L</sup><sup>k</sup> := L◦L<sup>k</sup>−<sup>1</sup> for <sup>k</sup> <sup>≥</sup> <sup>2</sup>. First we have the following lemma showing that in our settings every decreasing chain <sup>S</sup> in the DeepPoly domain <sup>A</sup> has a meet ,# <sup>S</sup> ∈ A.

Lemma 1. *Let* <sup>A</sup><sup>n</sup> *be the* <sup>n</sup>*-dimensional DeepPoly domain and* {a(k) }⊆A<sup>n</sup> *a decreasing bounded sequence of non-empty abstract elements. If the coefficients in* a(k),<sup>≤</sup> i *and* a(k),<sup>≥</sup> <sup>i</sup> *are uniformly bounded, then there exists an abstract element* <sup>a</sup><sup>∗</sup> ∈ A<sup>n</sup> *s.t.* γ(a∗) = ,<sup>∞</sup> <sup>k</sup>=1 <sup>γ</sup>(a(k) )*.*

Remark: The condition that the coefficients in a(k),<sup>≤</sup> <sup>i</sup> and <sup>a</sup>(k),<sup>≥</sup> <sup>i</sup> are uniformly bounded are naturally satisfied in our setting, since in a DNN the coefficients and bounds involved have only finitely many values. Readers can refer to [50] for a formal proof.

Lemma <sup>1</sup> implies that if our sequence {Yk} is decreasing, then the iterative refinement converges to an abstract element in DeepPoly, which is the greatest fixed point of <sup>L</sup> that is smaller than <sup>f</sup>#(B¯∞(x, r)). A sufficient condition for {Yk} being decreasing is that during the abstract interpretation in every Yk, every initial uncertain neuron maintains its abstraction mode, i.e. its corresponding λ does not change, before its ReLU behavior is determined. A weaker sufficient condition for convergence is that change in abstraction mode of uncertain neurons never happens after finitely many iterations.

If the abstraction mode of uncertain neurons changes infinitely often, generally the sequence {Yk} does not converge. In this case, we can consider its subsequence in which every Y<sup>k</sup> is obtained with the same abstraction mode. It is easy to see that such a subsequence must be decreasing and thus have a meet, as it is an accumulative point of the sequence {Yk}. Since there are only finitely many choices of abstraction modes, such a accumulative points exists in {Yk}, and there are only finitely many accumulative points. We conclude these results in the following theorem which describes the convergence behavior of our iterative refinement of the spurious region:

Theorem 2. *There exists a subsequence* {Yn<sup>k</sup> } *of* {Yk} *s.t.* {Yn<sup>k</sup> } *is decreasing and thus has a meet* ,#{Yn<sup>k</sup> }*. Moreover, the set*

$$\left\{ \bigcap \, \prescript{\#}{}{\{Y\_{n\_k}\}} \mid \{Y\_{n\_k}\} \text{ is a decreasing subsequence of } \{Y\_k\} \right\}$$

*is finite, and it is a singleton if exact one abstraction mode of uncertain* ReLU *neurons happens infinitely often.*

*Proof.* Since the abstraction modes of uncertain ReLU neurons have only finitely many choices, there must be one which happens infinitely often in the computation of the sequence {Yk}, and we choose the subsequence {Y<sup>n</sup><sup>k</sup> } in which every item is computed through this abstraction mode. Obviously {Y<sup>n</sup><sup>k</sup> } is decreasing and thus has a meet.

For a decreasing subsequence {Y<sup>n</sup><sup>k</sup> }, we can find its subsequnce in which the abstraction mode of uncertain ReLU neurons does not change, and they have the same meet. Since there are only finitely many choices of abstraction modes of uncertain ReLU neurons, such accumulative points of {Yk} also have finitely many values. If exact one abstraction mode of uncertain ReLU neurons happens infinitely often, obviously there is only one accumulative point in {Yk}.

#### 4.3 Optimizations

In the implementation of our main algorithm, we propose the following optimizations to improve the precision of refinement.

*Optimization 1: More precise constraints in linear programming.* In Line 15 of Alg. 1, it is not the best choice to take the linear constraints in the abstract element Y into linear programming, because the abstraction of uncertain ReLU neurons in DeepPoly is not the best. Planet [10] has a component which gives a more precise linear approximation for uncertain ReLU relations, where it uses the linear constraints <sup>y</sup> <sup>≤</sup> <sup>u</sup>(x−l) <sup>u</sup>−<sup>l</sup> , y <sup>≥</sup> x, y <sup>≥</sup> <sup>0</sup> to over-approximate the relation <sup>y</sup> = ReLU(x) with <sup>x</sup> <sup>∈</sup> [l, u].

*Optimization 2: Priority to work on small spurious regions.* In Line 6 of Alg. 1,we determine the order of refining the spurious regions based on their sizes, i.e., a smaller region is chosen earlier. This is based on the intuition that Alg. 1 works effectively if the spurious region is small. After the small spurious regions are ruled out, the constraints of large spurious regions can be tightened with the conjunction i−1 <sup>j</sup>=1(y<sup>C</sup><sup>f</sup> (x) <sup>−</sup> <sup>y</sup><sup>t</sup><sup>j</sup> <sup>≥</sup> 0). It is difficult to strictly determine which spurious region is the smallest, and thus we refer to the lower bound of <sup>y</sup><sup>C</sup><sup>f</sup> (x) <sup>−</sup> <sup>y</sup><sup>t</sup><sup>i</sup> given by DeepPoly, i.e., the larger this lower bound is, the smaller the spurious region is likely to be, and we perform the for loop in Line 6 of Alg. 1 in this order.

### 5 Quantitative Robustness Verification

In this section we recall the notion of quantitative robustness and show how to verify a quantitative robustness property of a DNN with spurious region guided refinement.

In practice, we may not need a strict condition of robustness to ensure that an input x is not an adversarial example. A notion of mutation testing is proposed in [44,43], which requires that an input x is normal if it has a low *label change rate* on its neighbourhood. They follow a statistical way to estimate the label change rate of an input, which motivates us to give a formal definition of the property showing a low label change rate, and to consider the verification problem for such a property. Below we recall the definition of *quantitative robustness* [27], where we have a parameter <sup>0</sup> < η <sup>≤</sup> <sup>1</sup> representing the confidence of robustness.

Definition 2. *Given a DNN* <sup>C</sup><sup>f</sup> : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>C</sup>*, an input* <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup>*,* r > <sup>0</sup>*,* <sup>0</sup> < η <sup>≤</sup> <sup>1</sup>*, and a probability measure* <sup>μ</sup> *on* <sup>B</sup>¯∞(x, r)*,* <sup>f</sup> *is* <sup>η</sup>*-robust at* <sup>x</sup>*, if*

$$\mu(\{x' \in \bar{B}\_\infty(x, r) \mid C\_f(x') = C\_f(x)\}) \ge \eta.$$

Def. 2 has a tight association with label change rate, i.e., if x is η-robust, then the label change rate should be smaller than, or close to <sup>1</sup> <sup>−</sup> <sup>η</sup>. Hereafter, we set <sup>μ</sup> to be the uniform distribution on <sup>B</sup>¯∞(x, r).

It is natural to adapt spurious region guided refinement to quantitative robustness verification. In Alg. 1, we do not return UNKNOWN when we cannot rule out a spurious region, but record the volume of the box X as an over-approximation of the Lebesgue measure of the spurious region. After we work on all the spurious regions, we calculate the sum of these volume, and obtain a sound robustness confidence. Here we do not calculate the volume of the spurious region because precise calculation of volume of a high-dimensional polytope remains open, and we do not choose to use randomized algorithms because it may not be sound.

We further improve the algorithm through the powerset technique [13]. Powerset technique is a classical and effective way to enhance the precision of abstract interpretation. We split the input region into several subsets, and run abstract interpretation on these subsets, In our quantitative robustness verification setting, powerset technique not only improves the precision, but also accelerates the algorithm in some situations: If the subsets have the same volume, and the percentage of the subsets on which we may fail to verify robustness is already smaller than <sup>1</sup> <sup>−</sup> <sup>η</sup>, then we have successfully verified the η-robustness property.

### 6 Experimental Evaluation

We implement our approach as a prototype called DeepSRGR. The implementation is based on a re-implementation of the ReLU and the affine abstract transformers of DeepPoly in Python 3.7 and we amend it accordingly to implement Alg. 1. We use CVXPY [8] as our modeling language for convex optimization problems and CBC [18] as the LP solver. It is worth mentioning that we ignore the floating point error in our re-implementation of DeepPoly because sound linear programming currently does not scale in our experiments. In the terminating condition, we set N = 5. The two optimizations in Sect. 4.3 are adopted in all the experiments. All the experiments are conducted on a CentOS 7.7 server with 16 Intel Xeon Platinum 8153 @2.00GHz (16 cores) and 512G RAM, and they use 96 sub-processes concurrently at most. Readers can find all the source code and other experimental materials in https://iscasmc.ios.ac. cn/ToolDownload/?Tool=DeepSRGR.

*Datasets.* We use MNIST [22] and ACAS Xu [12,17] as the datasets in our experiments. MNIST contains 60 000 grayscale handwritten digits of the size 28×28. We can train DNNs to classify the images by the written digits on them. The ACAS Xu system is aimed to avoid airborne collisions for unmanned aircrafts and it uses an observation table to make decisions for the aircraft. In [19], the observation table is realized by training DNNs instead of storing it.

*Networks.* On MNIST, we trained seven fully connected networks of the size 6 × 20, <sup>3</sup> <sup>×</sup> <sup>50</sup>, <sup>3</sup> <sup>×</sup> <sup>100</sup>, <sup>6</sup> <sup>×</sup> <sup>100</sup>, <sup>6</sup> <sup>×</sup> <sup>200</sup>, <sup>9</sup> <sup>×</sup> <sup>200</sup>, and <sup>6</sup> <sup>×</sup> <sup>500</sup>, where <sup>m</sup> <sup>×</sup> <sup>n</sup> refers <sup>m</sup> hidden layers and n neurons in each hidden layer, and we name them from FNN2 to FNN8, respectively (we also have a small network FNN1 for testing). On ACAS Xu, we randomly choose three networks used in [20], all of the size 6 × 50.

#### 6.1 Improvement in precision

First we compare DeepPoly and DeepSRGR in terms of their precision of robustness verification. We consider the following two indices: (i) the maximum radius that the two tools can verify, and (ii) the number of uncertain ReLU neurons whose behaviors can be further determined by DeepSRGR. For each network, we randomly choose three images from the MNIST dataset, and calculate their maximum radius that the two tools can verify through a binary search on the seven FNNs. In column *"# uncertin ReLU"* we record the number of the uncertain ReLU neurons when first applying DeepPoly, and also count how many of them are *renewed*, namely become definitely activated/deactivated in later iterations when applying DeepSRGR.

Table 1 shows the results. We can see from Table 1 that DeepSRGR can verify much stronger (i.e., larger maximum radius) robustness properties than DeepPoly. The average number of iterations for ruling out a spurious region is 2.875, and about half of the spurious regions can be ruled out within 2 iterations. DeepSRGR sometimes determines behaviors of a large proportion of uncertain ReLU neurons on large networks: Considering the last picture of the most challenging network FNN8, more than ninety percent (92.6% <sup>≈</sup> <sup>1269</sup> <sup>1371</sup> ) of the uncertain neurons are renewed. Improvement in precision evaluated in this experiment works for verification of both robustness and quantitative robustness, and this is why our method is effective in both tasks.

#### 6.2 Robustness verification performance

In this setting, we randomly choose 50 samples from the MNIST dataset. We fix four radii, 0.037, 0.026, 0.021, and 0.015 for the four networks FNN4 – FNN7 respectively, and verify the robustness property with the corresponding radius on the 50 inputs. The radius chosen here is very challenging for the corresponding network.

Table 2 presents the results. As we can see, DeepSRGR can verify significantly more properties than DeepPoly. Linear programming in DeepSRGR takes a large amount of time in the experiment, and thus DeepSRGR is less efficient (a DeepPoly run takes no


Table 1. Maximum radius which can be verified by DeepPoly and DeepSRGR, and details of DeepSRGR running on its maximum radius, where in the number of renewed uncertain nuerons, we show the largest one among the spurious regions. MAX, AVG, and GT means the maximum, the average, and the grant total among the spurious regions, respectively. The indices of the three images are 414, 481, and 65 in the MNIST dataset.

more than 100 seconds on FNN7). Furthermore, we again run the 15 running examples which are not verified by DeepSRGR on FNN4, by resetting the maximum number of iterations to 20 and 50. We have the following observations:



Table 2. The number that DeepPoly and DeepSRGR verifies among the 50 inputs, and the maximum/average running time of DeepSRGR.

Fig. 3. Number of renewed ReLU behaviors in the spurious regions newly ruled out.

We observe that, by increasing the termination threshold N from 5 to 50, only two more properties out of 15 can be verified additionally. This suggests that our method can effectively identify these spurious regions which are relevant to verification of the property, in a small number of iterations.

#### 6.3 Quantitative robustness verification on ACAS Xu networks

We evaluate DeepSRGR for quantitative robustness verification on ACAS Xu networks. We randomly choose five inputs, and compute the maximum robustness radius for each input on the three networks with DeepPoly through a binary search. In our experiment, the radius for a running example is the maximum robustness radius plus 0.02, 0.03, 0.04, 0.05, and 0.06. We use the powerset technique and the number of splits is 32. For DeepPoly, the robustness confidence it gives is the proportion of the splits on which DeepPoly verifies the property.

Fig. 4 shows the results. We can see that DeepSRGR gives significantly better overapproximation of <sup>1</sup>−<sup>η</sup> than DeepPoly. That is, in more than 90% running examples, our over-approximation is no more than one half of that given by DeepPoly, and in more than 75% of the cases, our over-approximation is even smaller than one tenth of that given by DeepPoly.

Fig. 4. Quantitative robustness verification using DeepPoly and DeepSRGR

### 7 Related Works and Conclusion

We have already discussed papers mostly related to our paper. Here we add some more new results. Marabou [21] has been developed as the next generation of Reluplex. Recently, verification approach based on abstraction of DNN models has been proposed in [11,2]. In addition, alternative approaches based on constraint-solving [26,29,5,25], layer-by-layer exhaustive search [16], global optimization [31,9,32], functional approximation [47], reduction to two-player games [48,49], and star set abstraction [41,40] have been proposed as well.

In this work, we propose a spurious region guided refinement approach for robustness and quantitative robustness verification of deep neural networks, where abstract interpretation calculates an abstraction, and linear programming performs refinement with the guidance of the spurious region. Our experimental results show that our tool can significantly improve the precision of DeepPoly, verify more robustness properties, and often provide a quantitative robustness with strict soundness guarantee.

Abstraction interpretation based framework is quite extensive to different DNN models, different properties, and incorporate different verification methods. As future work, we will investigate how to increase the precision further by using more precise linear over-approximation like [35].

### Acknowledgement

This work has been partially supported by Key-Area Research and Development Program of Guangdong Province (Grant No. 2018B010107004), National Natural Science Foundation of China (Grant No. 61761136011, 61836005), Natural Science Foundation of Guangdong Province, China (Grant No. 2019A1515011689), and the Fundamental Research Funds for the Zhejiang University NGICS Platform.

### References


30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018. pp. 6615–6624. AAAI Press (2018)


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

**Analysis of Network Communication**

# **Resilient Capacity-Aware Routing**

Stefan Schmid1, Nicolas Schnepf2, and Jiˇr´ı Srba2(-)

<sup>1</sup> Faculty of Computer Science, University of Vienna, Vienna, Austria <sup>2</sup> Department of Computer Science, Aalborg University, Aalborg, Denmark

**Abstract.** To ensure a high availability, communication networks provide resilient routing mechanisms that quickly change routes upon failures. However, a fundamental algorithmic question underlying such mechanisms is hardly understood: how to verify whether a given network reroutes flows along *feasible* paths, without violating capacity constraints, for up to k link failures? We chart the algorithmic complexity landscape of resilient routing under link failures, considering shortest path routing based on link weights as e.g. deployed in the ECMP protocol. We study two models: a *pessimistic* model where flows interfere in a worst-case manner along equal-cost shortest paths, and an *optimistic* model where flows are routed in a best-case manner, and we present a complete picture of the algorithmic complexities.We further propose a strategic search algorithm that checks only the critical failure scenarios while still providing correctness guarantees. Our experimental evaluation on a benchmark of Internet and datacenter topologies confirms an improved performance of our strategic search by several orders of magnitude.

### **1 Introduction**

Routing and traffic engineering are most fundamental tasks in a communication network. Internet Service Providers (ISPs) today use several sophisticated strategies to efficiently provision their backbone network to serve intra-domain traffic. This is challenging as in addition to simply providing reachability, routing protocols should also account for capacity constraints: to meet quality-ofservice guarantees, congestion must be avoided. Intra-domain routing protocols are usually based on shortest paths, and in particular the Equal-Cost-MultiPath (ECMP) protocol [24]. Flows are split at nodes where several outgoing links are on shortest paths to the destination, based on per-flow static hashing [7, 30]. In addition to default routing, most modern communication networks also provide support for resilient routing: upon the detection of a link failure, the network nodes quickly and collaboratively recompute the new shortest paths [21].

However, today, we still do not have a good understanding of the algorithmic complexity of shortest path routing subject to capacity constraints, especially under failures. In particular, in this paper we are interested in the basic question: "Given a capacitated network based on shortest path routing (defined by link weights), can the network tolerate up to k link failures without violating capacity constraints?" Surprisingly only little is known about the complexity aspects.

srba@cs.aau.dk

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 411–429, 2021. https://doi.org/10.1007/978-3-030-72016-2 22

Fig. 1: Classification of possible network situations


Fig. 2: Summary of complexity results for capacity problems

Our Contributions. We provide a complete characterization of the algorithmic complexity landscape of resilient routing and introduce two basic models of how traffic is distributed across the multiple shortest paths. A **pessimistic (P)** one where flows add up in a worst-case manner; if a network is resilient in the pessimistic model, it is guaranteed that routing succeeds along any shortest path without overloading links. In the **optimistic (O)** model flows add up in a bestcase manner; if a network is resilient in the optimistic model, it may be that the specific routing does not overload the links. The two models hence cover the two extremes in the spectrum and alternative routing schemes, e.g., (pseudo)random routing hence lies in between. Figure 1 illustrates the situations that can arise in a network: depending on the scenario, pessimistic (P) or optimistic (O), and whether the routing feasibility test is positive or negative, we can distinguish between three regimes. (1) If routing is feasible even in the pessimistic case, then flows can be safely forwarded by any routing policy without violating any capacity constraints. (2) If the pessimistic test is negative but positive in the optimistic case, then further considerations are required to ensure that flows use the feasible paths (e.g., a clever routing algorithm to find the suitable paths is needed). (3) If even the optimistic test is negative then no feasible routing solution exists; to be able to successfully route flows in this case, we need to change the network characteristics, e.g., to increase the link capacities.

We further distinguish between **splittable (S)** and **nonsplittable (N)** flows, and refer to the four possible problems by **PS**, **PN**, **ON**, and **OS**. Our main complexity results are summarized in Figure 2. We can see that without link failures (Figure 2a), the problems are solvable in polynomial time, except for the ON problem that becomes NP-complete. Moreover, the pessimistic variants of the problem can be solved even in nondeterministic logarithmic space, implying that they allow for efficient parallelization [33]. On the other hand, the optimistic splittable problem is hard for the class P. For the problems with link failures (Figure 2b) the complexity increases and the problems become co-NPcomplete, apart from the ON problem that becomes more difficult to solve and is complete for the second level of the polynomial hierarchy [33].

The high computational complexity of the instances with link failures may indicate that a brute-force search algorithm exploring all failure scenarios is needed to verify whether routing is feasible. However, we present a more efficient solution, by defining a partial ordering on the possible failure scenarios with the property that for the pessimistic model, we only need to explore the minimum failure scenarios, and for the optimistic model, it is sufficient to explore the maximum failure scenarios. We present an efficient strategic search algorithm implementing these ideas, formally prove its correctness, and demonstrate the practical applicability of strategic search on a benchmark of Internet and datacenter topologies. In particular, we find that our algorithm achieves up to several orders of magnitude runtime savings compared to the brute-force search.

Related Work. Efficient traffic routing has received much attention in the literature, and there also exist empirical studies on the efficiency of ECMP deployments, e.g., in Internet Service Provider Networks [17] or in datacenters [22]. A systematic algorithmic study of routing with ECMP is conducted by Chiesa et al. in [10]. The authors show that in the splittable-flow model [16], even approximating the optimal link-weight configuration for ECMP within any constant factor is computationally intractable. Before their work, it was only known that minimizing congestion is NP-hard (even to just provide "satisfactory" quality [2] and also under path cardinality constraints [5]) and cannot be approximated within a factor of 3/2 [19]. For specific topologies the authors further show that traffic engineering with ECMP remains suboptimal and computationally hard for hypercube networks. We significantly extend these insights into the algorithmic complexity of traffic engineering and introduce the concept of pessimistic and optimistic variants of routing feasibility and provide a complete characterization of the complexity of routing subject to capacity constraints, also in scenarios with failures. Accounting for failures is an important aspect in practice [13, 31] but has not been studied rigorously in the literature before; to the best of our knowledge, so far there only exist heuristic solutions [18] with some notable exceptions such as Lancet [8] (which however does not account for congestion). We propose to distinguish between optimistic and pessimistic flow splitting; existing literature typically revolves around the optimistic scenario.

We note that while we focus on IP networks (and in particular shortest path routing and ECMP), there exist many interesting results on the verification and reachability testing in other types of networks and protocols, including BGP [4, 15], MPLS [25, 38], OpenFlow [1] networks, or stateful networks [29, 32, 41]. While most existing literature focuses on verifying logical properties, such as reachability without considering capacity constraints, there also exist first works dealing with quantitative properties [20, 26, 29].

### **2 Network with Capacities and Demands**

We shall now define the model of network with link capacities and flow demands and formally specify the four variants of the resilient routing problem. Let N be the set of natural numbers and N<sup>0</sup> the set of nonnegative integers.

**Definition 1 (Network with Capacities and Demands).** A Network with Capacities and Demands (NCD) is a triple N = (V, C, D) where V is a finite set of nodes, <sup>C</sup> : <sup>V</sup> <sup>×</sup> <sup>V</sup> <sup>→</sup> <sup>N</sup><sup>0</sup> is the capacity function for each network edge (capacity <sup>0</sup> implies the absence of a network link), and <sup>D</sup> : <sup>V</sup> <sup>×</sup> <sup>V</sup> <sup>→</sup> <sup>N</sup><sup>0</sup> is the end-to-end flow demand between every pair of nodes such that D(v, v)=0 for all <sup>v</sup> <sup>∈</sup> <sup>V</sup> (demand <sup>0</sup> means that there is no flow).

Let <sup>N</sup> = (V, C, D) be an NCD. A path from <sup>v</sup><sup>1</sup> to <sup>v</sup><sup>n</sup> where <sup>v</sup>1, v<sup>n</sup> <sup>∈</sup> <sup>V</sup> is any nonempty sequence of nodes <sup>v</sup>1v<sup>2</sup> ··· <sup>v</sup><sup>n</sup> <sup>∈</sup> <sup>V</sup> <sup>+</sup> such that <sup>C</sup>(vi, vi+1) <sup>&</sup>gt; 0 for all <sup>i</sup>, <sup>1</sup> <sup>≤</sup> i<n. Let s, t <sup>∈</sup> <sup>V</sup> . By Paths(s, t) we denote the set of all paths from <sup>s</sup> to <sup>t</sup>. Let <sup>π</sup> <sup>∈</sup> Paths(s, t) be a path in <sup>N</sup> such that <sup>π</sup> <sup>=</sup> <sup>v</sup>1v<sup>2</sup> ...vn. An edge is a pair of nodes (v, v ) <sup>∈</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> such that <sup>C</sup>(v, v ) > 0. We write (v, v ) <sup>∈</sup> <sup>π</sup> whenever (v, v )=(vi, vi+1) for some <sup>i</sup>, 1 <sup>≤</sup> i<n.

Routes in an NCD are traditionally determined by annotating the links with weights and employing shortest path routing (e.g. ECMP). In case of multiple shortest paths, traffic engineers select either one of the shortest paths or decide to split the flow among the different shortest paths for load-balancing purposes. When one or multiple links fail, the set of shortest paths may change and the routes need to be updated. The weight assignment is usually provided by the network operators and is primarily used for traffic engineering purposes.

**Definition 2 (Weight Assignment).** Let N = (V, C, D) be an NCD. A weight assignment on <sup>N</sup> is a function <sup>W</sup> : <sup>V</sup> <sup>×</sup> <sup>V</sup> <sup>→</sup> <sup>N</sup> ∪ {∞} that assigns each link a positive weight where C(v, v )=0 implies that W(v, v ) = <sup>∞</sup> for all v, v <sup>∈</sup> <sup>V</sup> .

Assume now a fixed weight assignment for a given NCD N = (V, C, D). Let <sup>π</sup> <sup>=</sup> <sup>v</sup>1v<sup>2</sup> ··· <sup>v</sup><sup>n</sup> <sup>∈</sup> <sup>V</sup> <sup>+</sup> be a path from <sup>v</sup><sup>1</sup> to <sup>v</sup>n. The weight of the path <sup>π</sup> is denoted by W(π) and defined by W(π) = <sup>n</sup>−<sup>1</sup> <sup>i</sup>=1 <sup>W</sup>(vi, vi+1). Let s, t <sup>∈</sup> <sup>V</sup> . The set of shortest paths from <sup>s</sup> to <sup>t</sup> is defined by SPaths(s, t) = {<sup>π</sup> <sup>∈</sup> Paths(s, t) <sup>|</sup> <sup>W</sup>(π) <sup>=</sup> <sup>∞</sup> and <sup>W</sup>(π) <sup>≤</sup> <sup>W</sup>(π ) for all <sup>π</sup> <sup>∈</sup> Paths(s, t)}. As the weights are positive, all shortest paths in the set SPaths(s, t) are acyclic and hence the set is finite (though of possibly exponential size).

For a given NCD N and a set of failed links F, we can now define the NCD N<sup>F</sup> where all links from F are removed.

**Definition 3.** Let N = (V, C, D) be an NCD with weight assignment W, and let <sup>F</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup><sup>V</sup> be a set of failed links. We define the pruned NCD <sup>N</sup><sup>F</sup> = (V,C<sup>F</sup> , D) with an updated weight assignment W<sup>F</sup> by

$$-\,^C F^F(v, v') = C(v, v') \text{ and } W^F(v, v') = W(v, v') \text{ if } (v, v') \notin F, \text{ and } -\,^C C^F(v, v') = 0 \text{ and } W^F(v, v') = \infty \text{ if } (v, v') \in F.$$

By Paths<sup>F</sup> (s, t) and SPaths<sup>F</sup> (s, t) we denote the sets of the paths and shortest paths between s and t in the network N<sup>F</sup> = (V,C<sup>F</sup> , D) with W<sup>F</sup> .

We shall now define a flow assignment that for each nonempty flow demand between s and t and every failure scenario, determines the amount of traffic that should be routed through the shortest paths between s and t.

**Definition 4 (Flow Assignment).** A flow assignment f in a capacity network <sup>N</sup> = (V, C, D) with weight assignment <sup>W</sup> and with the set <sup>F</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> of failed links is a family of functions f <sup>F</sup> s,t : SPaths<sup>F</sup> (s, t) <sup>→</sup> [0, 1] for all s, t <sup>∈</sup> <sup>V</sup> where D(s, t) > 0 such that <sup>π</sup>∈SPaths<sup>F</sup> (s,t) <sup>f</sup> <sup>F</sup> s,t(π)=1. A flow assignment f is nonsplittable if f <sup>F</sup> s,t(π) ∈ {0, <sup>1</sup>} for all s, t <sup>∈</sup> <sup>V</sup> and all <sup>π</sup> <sup>∈</sup> SPaths<sup>F</sup> (s, t). Otherwise the flow assignment is splittable.

The notation [0, 1] denotes the interval of all rational numbers between 0 and 1 and it determines how the load demand between the nodes s and t is split among the routing paths between the two nodes. A nonsplittable flow assignment assigns the value 1 to exactly one routing path between any two nodes s and t. If for a given failure scenario F there is no path between s and t for two nodes with D(s, t) > 0, then there is no flow assignment as the network is disconnected.

**Definition 5.** An NCD N = (V, C, D) is connected for the set of failed links <sup>F</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> if SPaths<sup>F</sup> (s, t) <sup>=</sup> <sup>∅</sup> for every s, t <sup>∈</sup> <sup>V</sup> where <sup>D</sup>(s, t) <sup>&</sup>gt; <sup>0</sup>.

For a connected NCD, we now define a feasible flow assignment that avoids congestion: the sum of portions of flow demands (determined by the flow assignment) that are routed through each link, may not exceed the link capacity.

**Definition 6 (Feasible Flow Assignment).** Let N = (V, C, D) be an NCD with weight assignment <sup>W</sup>. Let <sup>F</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> be the set of failed links s.t. the network remains connected. A flow assignment f is feasible if every link (v, v ) ∈ <sup>V</sup> <sup>×</sup> <sup>V</sup> with <sup>C</sup>(v, v ) <sup>&</sup>gt; <sup>0</sup> satisfies s,t∈<sup>V</sup> <sup>π</sup>∈SPaths<sup>F</sup> (s,t) (v,v- )∈π f <sup>F</sup> s,t(π) · <sup>D</sup>(s, t) <sup>≤</sup> <sup>C</sup>(v, v ).

We consider four different variants of the capacity problem.

**Definition 7 (Pessimistic Splittable/Nonsplittable (PS/PN)).** Given an NCD N with a weight assignment and nonnegative integer k, is it the case that for every set F of failed links of cardinality at most k, the network remains connected and every splittable/nonsplittable flow assignment on N with the set F of failed links is feasible?

**Definition 8 (Optimistic Splittable/Nonsplittable (OS/ON)).** Given an NCD N with a weight assignment and a nonnegative integer k, is there a feasible splittable/nonsplittable flow assignment on N for every set of failed links F of cardinality at most k?

A positive answer to the PN capacity problem implies positive answers to both PS and ON problems. A positive answer to either the PS or ON problem implies a positive answer to the OS problem. This is summarized in Figure 3 and it is easy to argue that the hierarchy is strict.

#### Fig. 3: Hierarchy

### **3 Analysis of Algorithmic Complexity**

We now provide the arguments for the upper and lower bounds from Figure 2.

**Algorithm 1** Computation of the shortest path graph function spgs,t

**Input:** NCD N = (V, C, D), weight assignment W and s, t ∈ V **Output:** Shortest path graph function *spg*s,t : <sup>V</sup> <sup>×</sup> <sup>V</sup> → {0, <sup>1</sup>} **if** *dist*(s, t) = <sup>∞</sup> **then** *spg*s,t(v, v ) := 0 for all v, v ∈ V **else for** v, v ∈ V **do if** *dist*(s, t) = *dist*(s, v) + W(v, v ) + *dist*(v , t) **then** *spg*s,t(v, v ) := 1 **else** *spg*s,t(v, v ) := 0 **return** *spg*s,t

**Complexity Upper Bounds.** We present first a few useful observations. Because network connectivity can be checked independently for each source s and target t where D(s, t) > 0 by computing the maximum flow [14] between s and t, we obtain the following lemma.

**Lemma 1.** Given an NCD N = (V, C, D) and a nonnegative integer k, it is polynomial-time decidable if <sup>N</sup> remains connected for all sets of failed links <sup>F</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> where <sup>|</sup>F| ≤ <sup>k</sup>.

Next, we present an algorithm that for an NCD N = (V, C, D) with the weight assignment <sup>W</sup> : <sup>V</sup> <sup>×</sup> <sup>V</sup> <sup>→</sup> <sup>N</sup> ∪ {∞} and a given pair of nodes s, t <sup>∈</sup> <sup>V</sup> computes in polynomial time the function spgs,t : <sup>V</sup> <sup>×</sup> <sup>V</sup> → {0, <sup>1</sup>} that assigns the value 1 to exactly all edges that appear on at least one shortest path (w.r.t. to the weight assignment W) between s and t. The edges that get assigned the value 1 hence form the shortest path subgraph between s and t. The algorithm uses the function dist(v, v ) that for every two nodes v, v <sup>∈</sup> <sup>V</sup> returns the length of the shortest path (again w.r.t. to the assignment W) from v to v and if v and <sup>v</sup> are not connected then it returns <sup>∞</sup>. Such an all-pairs shortest path function can be precomputed in polynomial time using e.g. the Johnson's algorithms [27]. The function spgs,t is defined by Algorithm 1.

**Lemma 2.** Let N = (V, C, D) be an NCD with weight assignment W and s, t <sup>∈</sup> <sup>V</sup> . Algorithm 1 runs in polynomial time and the value of spg s,t(v, v ) can be returned in nondeterministic logarithmic space. Moreover, there is an edge (v, v ) <sup>∈</sup> <sup>π</sup> for some <sup>π</sup> <sup>∈</sup> SPaths(s, t) iff spgs,t(v, v )=1.

We first present results for k = 0 (no link failures) and start by showing that the optimistic splittable variant of the capacity problem is decidable in polynomial time by reducing it to the feasibility of a linear program. Let N = (V, C, D) be an NCD with weight assignment W and let spgs,t be precomputed for all pairs of s and t. We construct a linear program over the variables xs,t(v, v ) for all s, t, v, v <sup>∈</sup> <sup>V</sup> where the variable <sup>x</sup>s,t(v, v ) represents the percentage of the total demand D(s, t) between s and t that is routed through the link (v, v ). In the equations below, we let s and t range over all nodes that satisfy D(s, t) > 0.

$$1 \ge x^{s,t}(v, v') \ge 0 \quad \text{ for } s, t, v, v' \in V \tag{1}$$

$$\sum\_{v \in V} x^{s,t}(s,v) \cdot spg^{s,t}(s,v) = 1 \quad \text{for } s, t \in V \tag{2}$$

$$\sum\_{v \in V} x^{s,t}(v, t) \cdot spg^{s,t}(v, t) = 1 \quad \text{for } s, t \in V \tag{3}$$

$$\begin{aligned} \sum\_{v' \in V} x^{s,t}(v',v) \cdot sp g^{s,t}(v',v) &= \\ \sum\_{v' \in V} x^{s,t}(v,v') \cdot sp g^{s,t}(v,v') \quad \text{for } s,t,v \in V, v \notin \{s,t\} \end{aligned} \tag{4}$$

$$\sum\_{s,t \in V} x^{s,t}(v, v') \cdot spg^{s,t}(v, v') \cdot D(s, t) \le C(v, v') \quad \text{ for } v, v' \in V \tag{5}$$

Equation 1 imposes that the flow portion on any link must be between 0 and 1. Equation 2 makes sure that portion of the demand D(s, t) must be split along all outgoing links from s that belong to the shortest path graph. Similarly Equation 3 guarantees that the flows on incoming links to t in the shortest path graph deliver the total demand. Equation 4 is a flow preservation equation among all incoming and outgoing links (in the shortest path graph) connected to every node v. The first four equations define all possible splittings of the flow demands for all s and t such that D(s, t) > 0. Finally, Equation 5 checks that for every link in the network, the total sum of the flows for all s-t pairs does not exceed the link capacity. The size of the constructed system is quadratic in the number of nodes and its feasibility, that can be verified in polynomial time [39], corresponds to the existence of a solution for the OS problem.

**Theorem 1.** The OS capacity problem without any link failures is decidable in polynomial time.

If we now restrict the variables to nonnegative intergers, we get an instance of integer linear program where feasibility checking is NP-complete [39], and corresponds to the solution for the nonsplittable optimistic problem.

**Theorem 2.** The ON capacity problem without any link failures is decidable in nondeterministic polynomial time.

Next, we present a theorem stating that both the splittable and nonsplittable variants of the pessimistic capacity problem are decidable in polynomial time and in fact also in nondeterministic logarithmic space (the complexity class NL).

**Theorem 3.** The PS and PN capacity problems without any link failures are decidable in nondeterministic logarithmic space.

Proof. Let N = (V, C, D) be a given NCD with a weight assignment W. Let us consider the shortest path graph represented by spgs,t as defined by Algorithm 1. Clearly, if the set SPaths(s, t) for some s, t <sup>∈</sup> <sup>V</sup> where <sup>D</sup>(s, t) <sup>&</sup>gt; 0 is empty, the answer to both the splittable and nonsplittable problem is negative. Otherwise, for each pair s, t <sup>∈</sup> <sup>V</sup> where <sup>D</sup>(s, t) <sup>&</sup>gt; 0, the entire demand <sup>D</sup>(s, t) can be routed (both in the splittable and nonsplittable case) through any edge (v, v ) that satisfies spgs,t(v, v ) = 1. Hence we can check whether for every edge (v, v ) ∈ <sup>V</sup> <sup>×</sup> <sup>V</sup> it holds

$$\sum\_{\substack{s,t \in V\\D(s,t)>0}} D(s,t) \cdot sp g^{s,t}(v,v') \le C(v,v') \ .$$

If this is the case, then the answer to both the splittable and the nonsplittable pessimistic problem is positive as there is no flow assignment that can exceed the capacity of any link. On the other hand, if for some link (v, v ) the sum of all demands that can be possibly routed through (v, v ) exceeds the link capacity, the answer to the problem (both splittable and nonsplittable) is negative. The algorithm can be implemented to run in nondeterministic logarithmic space.

Let us now turn our attention to the four variants of the problem under the assumption that up to k links can fail (where k is part of the input to the decision problem). Given an NCD N = (V, C, D) with a weight assignment W, we are asked to check, for all (exponentially many) failure scenarios <sup>F</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> where <sup>|</sup>F| ≤ <sup>k</sup>, whether the pruned NCD <sup>N</sup><sup>F</sup> with the weight assignment <sup>W</sup><sup>F</sup> (as defined in Definition 3) satisfies that the network N<sup>F</sup> is connected and every flow assignment is feasible (in case of the pessimistic case) or there exists a feasible flow assignment (in case of the optimistic case). As these problems are decidable in polynomial time for PN, PS and OS, we can conclude that the variants of the problems with failures belong to the complexity class co-NP: for the negation of the problems we can guess the failure scenario F for which the problem does not have a solution—this can be verified in polynomial time by Theorems 1 and 3.

### **Theorem 4.** The PN, PS and OS problems with link failures are in co-NP.

Finally, the same arguments can be used also for the optimistic nonsplittable problem with failures. However, as deciding the ON problem without failures is solvable only in nondeterministic polynomial time, the extra quantification of all failure scenarios means that the problem belongs to the class Π<sup>P</sup> <sup>2</sup> on the second level of the polynomial hierarchy [33]. This complexity class is believed to be computationally more difficult than the problems on the first level of the hierarchy (where the NP and co-NP problems belong to).

**Theorem 5.** The ON problem with link failures is in the complexity class Π<sup>P</sup> 2 .

**Complexity Lower Bounds.** We now prove the complexity lower bounds.

**Theorem 6.** The OS capacity problem without any link failures is P-hard under NC-reducibility.

Proof sketch. By NC-reduction from the P-complete maximum flow problem for directed acyclic graphs [35]: given a directed acyclic graph G with nonnegative edge capacities, two nodes s and t and a number m, is there a flow between s and t that preserves the capacity of all edges and has the volume of at least m? This problem can be rephrased as our OS problem by setting the demand D(s, t) = m and defining a weight assignment so that every relevant edge in G is on some shortest path from s to t. This can be achieved by topologically sorting the nodes (in NC<sup>2</sup> [11, 12]) and assigning the weights accordingly.

#### **Theorem 7.** The PS/PN problems without any link failures are NL-hard.

Proof sketch. Follows from NL-hardness of reachability in digraphs [33].

Next, we show that the ON problem is NP-hard, even with no failures.

**Theorem 8.** The ON capacity problem without any link failures is NP-hard, even for the case where all weights are equal to 1.

Proof. By a polynomial-time reduction from the NP-complete problem CNF-SAT [33]. Let <sup>ϕ</sup> <sup>=</sup> <sup>c</sup><sup>1</sup> <sup>∧</sup> <sup>c</sup><sup>2</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>c</sup><sup>n</sup> be a CNF-SAT instance where every clause <sup>c</sup>i, 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, is a disjunction of literals. A literal is either a variable <sup>x</sup>1,...,x<sup>k</sup> or its negation <sup>x</sup>1,..., <sup>x</sup>k. If a literal <sup>j</sup> ∈ {x<sup>j</sup> , <sup>x</sup>j} appears in the disjunction for the clause <sup>c</sup>i, we write <sup>j</sup> <sup>∈</sup> <sup>c</sup>i. A formula <sup>ϕ</sup> is satisfiable if there is an assignment of the variables x1,...,x<sup>k</sup> to true or false, so that the formula ϕ is satisfied (evaluates to true under this assignment). For a given formula ϕ we now construct an NCD N = (V, C, D) where


$$-\ D(s\_0, s\_k) = n,\text{ and } D(c\_i^s, c\_i^e) = 1 \text{ for all } i, \ 1 \le i \le n.$$

The capacities of edges and flow demands that are not mentioned above are all set to 0 and the weights of all edges are equal to 1. In Figure 4a we give an example of the reduction for a given satisfiable formula. As we consider the nonsplittable problem, the flow demand from s<sup>0</sup> to s<sup>k</sup> means that the whole demand of n units must go through either the link (xi, si) or (xi, si), for every i. This corresponds to choosing an assignment of the variables to true or false. For every clause c<sup>i</sup> we now have a unit flow from c<sup>s</sup> <sup>i</sup> to c<sup>e</sup> <sup>i</sup> that goes through the link (<sup>j</sup> , s<sup>j</sup> ) for every literal <sup>j</sup> appearing in the clause ci. This is only possible if this link is not already occupied by the flow demand from s<sup>0</sup> to sk; otherwise we exceed the capacity of the link. For each clause c<sup>i</sup> we need to find at least one literal <sup>j</sup> so that the flow can go through the edge (<sup>j</sup> , s<sup>j</sup> ). As the capacity of the edge (<sup>j</sup> , s<sup>j</sup> ) is n, it is possible to use this edge for all n clauses if necessary. We can observe that the capacity network can be constructed in polynomial time and we shall argue for the correctness of the reduction.

We can now observe that if ϕ is satisfiable, we can define a feasible flow assignment f by routing the flow demand of n between s<sup>0</sup> and s<sup>k</sup> so that it does

(a) NCD for the formula (x<sup>1</sup> ∨ x3) ∧ (x<sup>1</sup> ∨ x<sup>2</sup> ∨ x3). The capacity of unlabelled links is 1, otherwise 2; link weights are 1. Thick lines show a feasible nonsplittable flow assignment.

(b) Additional construction for the formula ∀y1, y2. ∃x1, x2, x3. (x<sup>1</sup> ∨ x<sup>3</sup> ∨ y<sup>1</sup> ∨ y<sup>1</sup> ∨ y2) ∧ (x<sup>1</sup> ∨ x<sup>2</sup> ∨ x<sup>3</sup> ∨ y2). Capacity of all links is 4 and weight of links is 1. Double arrows are 2-unbreakable links.

(c) Definition of m-unbreakable link of capacity n with m + 1 intermediate nodes

Fig. 4: Reduction to ON capacity problem without/with failures

not use the links corresponding to the satisfying assignment for ϕ and then every clause in ϕ can be routed through the links corresponding to one of the satisfied literals. For the other direction where ϕ is not satisfieable, we notice that any routing of the flow demand between s<sup>0</sup> and s<sup>k</sup> (corresponding to some truth assignment of ϕ) leaves at least one clause unsatisfied and it is then impossible to route the flow for such a clause without violating the capacity constraints.

We now extend the reduction from Theorem 8 to the OS case with link failures and prove its hardness for the second level of the polynomial hierarchy.

**Theorem 9.** The ON problem with link failures is Π<sup>P</sup> <sup>2</sup> -hard.

Proof. By reduction from the validity of the quantified Boolean formula of the form <sup>∀</sup>y1, y2,...,ym. <sup>∃</sup>x1, x2,...,xk. ϕ where <sup>ϕ</sup> <sup>=</sup> <sup>c</sup><sup>1</sup> <sup>∧</sup> <sup>c</sup><sup>2</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>c</sup><sup>n</sup> is a Boolean formula in CNF over the variables y1,...,ym, x1,...,xk. The validity problem of such quantified formula is Π<sup>P</sup> <sup>2</sup> -hard (see e.g. [33]). For a given quantified formula, we shall construct an instance of the ON problem such that the formula is valid if and only if the ON problem with up to m link failures (where m is the number of y-variables) has a positive answer. The reduction uses the construction from Theorem 8 where we described a reduction from the validity of the formula <sup>∃</sup>x1, x2,...,xk. ϕ. The construction is further enhanced by introducing new nodes y<sup>j</sup> , y<sup>j</sup> , e<sup>j</sup> and new edges of capacity 2n (where n is the number of clauses) such that <sup>C</sup>(y<sup>j</sup> , <sup>y</sup><sup>j</sup> ) = <sup>C</sup>(y<sup>j</sup> , e<sup>j</sup> )=2n, for all <sup>i</sup>, 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>m</sup>.

Now for every clause c<sup>i</sup> we add the so-called m-unbreakable edge of capacity n from c<sup>s</sup> <sup>i</sup> to y<sup>j</sup> and from e<sup>j</sup> to c<sup>e</sup> <sup>i</sup> for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> and 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>m</sup>. Moreover, whenever the literal y<sup>j</sup> appears in the clause ci, we also add an m-unbreakable edge from y<sup>j</sup> to c<sup>e</sup> <sup>i</sup> and whenever the literal y<sup>j</sup> appears in the clause ci, we add m-unbreakable edge from c<sup>s</sup> <sup>i</sup> to y<sup>j</sup> . The construction of m-unbreakable edges (denoted by double arrows) is given in Figure 4c where the capacity of each link is set to <sup>n</sup>. Finally, for each <sup>j</sup>, 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>m</sup>, we add the unbreakable edges from s<sup>1</sup> to y<sup>j</sup> and from e<sup>j</sup> to sk. The flow demands in the newly constructed network are identical to those from the proof of Theorem 8 and the weights of all newly added edges are set to 1 and we set the weight of the two links s<sup>0</sup> to x<sup>1</sup> and s<sup>0</sup> to x<sup>1</sup> to 6. The reduction can be clearly done in polynomial time. Figure 4b demonstrates an extension of the construction from Figure 4a with additional nodes and links that complete the reduction. Observe, that even in case of m link failures, the unbreakable links that consist of m + 1 edge disjoint paths are still capable of carrying all the necessary flow traffic.

We shall now argue that if the formula <sup>∀</sup>y1, y2,...,ym. <sup>∃</sup>x1, x2,...,xk. ϕ is valid then the constructed instance of the ON problem with up to m link failures has a solution. We notice that any subset of up to m failed links either breaks exactly one of the newly added edges (y<sup>j</sup> , <sup>y</sup><sup>j</sup> ) and (y<sup>j</sup> , e<sup>j</sup> ) for all <sup>j</sup>, 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>m</sup>, in which case this determines a valid truth assignment for the y-variables and as in the previous proof, the flow from s<sup>0</sup> to s<sup>k</sup> can now be routed so that for each clause there is at least one satisfied literal. Otherwise, there is a variable y<sup>j</sup> such that both of the edges (y<sup>j</sup> , y<sup>j</sup> ) and (y<sup>j</sup> , e<sup>j</sup> ) are present and all flow demands can now be routed through these two edges (that have sufficient capacity for this) by using the m-unbreakable edges. The opposite direction where the formula is not valid means that there is a truth assignment to the y-variables so that irrelevant of the assignment for x-variables there is at least one clause that is not satisfied. We simply fail the edges that correspond to such a y-variables assigment and the same arguments as in the previous proof imply that there is not any feasible flow assignment for this failure scenario.

#### **Theorem 10.** The PN, PS and OS problems with link failures are co-NP-hard.

Proof sketch. By reduction from the NP-complete shortest path most vital edges problem (SP-MVE) [3,36]. The input to SP-MVE is a directed graph G = (V,E) with positive edge weights, two nodes s, t <sup>∈</sup> <sup>V</sup> and two positive numbers <sup>k</sup> and


H. The question is whether there exist at most k edges in E such that their removal creates a graph with the length of the shortest path between s and t being at least H. We reduce the SP-MVE to the negation of the PN/PS in order to demonstrate co-NP-hardness.

We modify the G by inserting a new edge between s and t of weight H and capacity 1, while setting the capacity 2 for all other edges in G. If the SP-MVE problem has a solution <sup>F</sup> <sup>⊆</sup> <sup>E</sup> where <sup>|</sup>F| ≤ <sup>k</sup>, then the added edge (s, t) becomes one of the shortest paths between s and t under the failure scenario F and a flow demand of size 2 between s and t can be routed through this edge, violating the capacity constraints. If the SP-MVE problem does not have a solution, then after the removal of at most k links, the length of the shortest path between s and t remains strictly less than H and any flow assignment along the shortest paths is feasible. We hence conclude that PN/PS problems are co-NP-hard. A small modification of the construction is needed for hardness of the OS problem.

### **4 A Fast Strategic Search Algorithm**

In order to solve the PS, PN, ON and OS problems, we can enumerate all failure scenarios for up to k failed links (omitting the links with zero capacity), construct the pruned network for each such failure scenario and then apply our algorithms in Theorems 1, 2 and 3. This brute-force search approach is formalized in Algorithm 2 and its worst-case running time is exponential.

Our complexity results indicate that the exponential behavior of any algorithm solving a co-NP-hard (or even Π<sup>P</sup> <sup>2</sup> -hard) problem is unavoidable (unless P=NP). However, in practice many concrete instances can be solved fast if more refined search algorithms are used. To demonstrate this, we present a novel strategic search algorithm for verifying the feasibility of shortest path routing under failures. At the heart of our algorithm lies the idea to reduce the number of explored failure scenarios by skipping the "uninteresting" ones. Let us fix an NCD <sup>N</sup> = (V, C, D) with the weight assignment <sup>W</sup>. We define a relation <sup>≺</sup> on failure scenarios such that <sup>F</sup> <sup>≺</sup> <sup>F</sup> iff for all flow demands we preserve in <sup>F</sup> at least one of the shortest paths that are present under the failure scenario F.

**Definition <sup>9</sup>.** Let F, F <sup>∈</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> . We say that <sup>F</sup> preceeds <sup>F</sup> , written <sup>F</sup> <sup>≺</sup> <sup>F</sup> , if SPaths<sup>F</sup> (s, t) <sup>⊇</sup> SPaths<sup>F</sup> - (s, t) and SPaths<sup>F</sup> (s, t) <sup>∩</sup> SPaths<sup>F</sup> - (s, t) <sup>=</sup> <sup>∅</sup> for all s, t <sup>∈</sup> <sup>V</sup> where <sup>D</sup>(s, t) <sup>&</sup>gt; <sup>0</sup>.

We first show that if <sup>F</sup> <sup>≺</sup> <sup>F</sup> and the failure scenario <sup>F</sup> has a feasible routing solution for the pessimistic problem, then F also has a solution. Thus instead of exploring all possible failure scenarios like in the brute-force algorithm, it is sufficient to explore only failure scenarios that are minimal w.r.t. ≺ relation.

**Lemma 3.** Let F, F <sup>∈</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> where <sup>F</sup> <sup>≺</sup> <sup>F</sup> . A positive answer to the PS/PN problem for the network N<sup>F</sup> with weight assignment W<sup>F</sup> implies a positive answer to the PS/PN problem for the network N<sup>F</sup> - with weight assignment W<sup>F</sup> - .

For the optimistic scenario, the implication is valid in the opposite direction: it is sufficient to explore only the maximum failure scenarios w.r.t. ≺.

**Lemma 4.** Let F, F <sup>∈</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> where <sup>F</sup> <sup>≺</sup> <sup>F</sup> . A positive answer to the OS/ON problem for the network N<sup>F</sup> - with weight assignment W<sup>F</sup> - implies a positive answer to the OS/ON problem for the network N<sup>F</sup> with weight assignment W<sup>F</sup> .

Hence for the pessimistic scenario, the idea of strategic search is to ignore failure scenarios that remove only some of the shortest paths but preserve at least one of such shortest paths. For the optimistic scenario, we on the other hand explore only the maximal failure scenarios where removing one additional link causes the removal of all shortest paths for at least one source and destination.

In our algorithm, we use the notation spgs,t <sup>F</sup> for the shortest path graph as defined in Algorithm 1 for the input graph N<sup>F</sup> with weight assignment W<sup>F</sup> . The function min cuts(spgs,t <sup>F</sup> , s, t) returns the set of all minimum cuts separating the nodes s and t (sets of edges that disconnect the source node s from the target node t in the shortest-path graph spgs,t <sup>F</sup> ). This function can be computed e.g. using the Provan and Shier algorithm [34], assuming that each edge has a unit weight and hence minimizing the number of edges in the minimum cut. There can be several incomparable minimum cuts (with the same number of edges) and by mincut size(spgs,t <sup>F</sup> , s, t) we denote the number of edges in each the minimum cuts from the set min cuts(spgs,t <sup>F</sup> , s, t).

Algorithm 3 now presents our fast search strategy, called strategic search. The input to the algorithm is the same as for the brute-force search. The algorithm initializes the pending set of failure scenarios to be explored to the empty failure scenario and it remembers the set of passed failure scenarios that were already verified. In the main while loop, a failure scenario F is removed from the pending set and depending on the type τ of the problem, we either directly verify the scenario F in the case of the pessimistic problems, or we call the function MaxFailureCheck(F) that instead verifies all maximal failure scenarios F such that <sup>F</sup> <sup>≺</sup> <sup>F</sup> . The correctness of Algorithm 3 is formally stated as follows.

**Theorem 11.** Algorithm 3 terminates and returns true iff the answer to the τ -problem is positive.

### **Algorithm 3** Strategic search

```
1: Input: NCD N = (V, C, D) with weigth assignment W, a number k ≥ 0 and type
   of capacity problem τ ∈ {PS,PN, ON, OS}
2: Output: true if the answer to the τ -problem is positive, else false
3: pending := {∅} \* initialize the pending set with the empty failure scenario *\
4: passed := ∅ \* already processed failure scenarios *\
5: while pending = ∅ do
6: let F ∈ pending; pending := pending \ {F}
7: switch τ do
8: case τ ∈ {PS,PN}: Build NF and WF by Definition 3, use Theorem 3
9: if the answer to the τ -problem was negative then return false
10: case τ ∈ {OS, ON}: call MaxFailureCheck(F)
11: passed := passed ∪ {F}
12: for s, t ∈ V such that D(s, t) > 0 do
13: if |F| + mincut size(spgs,t
                              F , s, t) ≤ k then
14: succ := {F ∪ C | C ∈ min cuts(spgs,t
                                          F , s, t), F ∪ C /∈ (pending ∪ passed)}
15: pending := pending ∪ succ
16: endwhile
17: return true
18:
19: procedure MaxFailureCheck(F) \* to be run only for the optimistic cases *\
20: for s, t ∈ V such that D(s, t) > 0 do
21: for C ∈ min cuts(spgs,t
                         F , s, t) do
22: for all C ⊂ C such that |F ∪ C
                                     | = min(k, |F ∪ C| − 1) do
23: if F ∪ C ∈/ passed then
24: construct NF ∪C-

                              and WF ∪C-

                                         by Definition 3
25: switch τ do
26: case τ = OS: use Theorem 1 and if negative then return false
27: case τ = ON: use Theorem 2 and if negative then return false
28: passed := passed ∪ {F ∪ C
                                      }
29: endfor
30: endfor
31: endfor
```
### **5 Experiments**

To evaluate the practical performance of our strategic search algorithms, we conducted experiments on various wide-area and datacenter network topologies. The reproducibility package with our Python implementation can be found at [37].

We study the algorithms' performance on a range of network topologies, and consider both sparse and irregular wide-area networks (using the Internet Topology Zoo [28] data set) as well as dense and regular datacenter topologies (namely fat-tree [9], BCube [23], and Xpander [40]). To model demands, for each topology, we consider certain nodes to serve as core nodes which have significant pairwise demands. Overall, we created 24,388 problem instances for our experimental benchmark, out of which we were able to solve 23,934 instances


Fig. 5: Median results, time in seconds (**B**: brute-force search, **S**: strategic search)

within a 2-hour timeout. In our evaluation, we filter out the trivial instances where the runtime is less than 0.1 second for both the brute-force and strategic search (as some of the instances e.g. contain a disconnected flow demand already without any failed links). The benchmark contains a mixture of both positive and negative instances for each problem for increasing number k of failed links.

Table 5 shows the median times for each series of experiments for the different scenarios. All experiments for each topology and given problem instance are sorted by the speedup ratio, i.e. B.time divided by S.time; we display the result for the experiment in the middle of each table. Clearly, our strategic search algorithm always outperforms the brute-force one by a significant factor in all the scenarios. We also report on the number of iterations (B.iter and S.iter) of the two algorithms, showing the number of failure scenarios to be explored.

Let us first discuss the pessimistic scenarios in more detail. Figure 6 shows a cactus plot [6] for the wide-area network setting (on the left) and for the datacenter setting (on the right). We note that y-axis in the figure is logarithmic. For example, to solve the 1500th fastest instances in the wide-area network (left), the brute-force algorithm uses more than 100 seconds, while the strategic algorithm solves the problem in less than a second; this corresponds to a speedup of more than two orders of magnitude. For more difficult instances, the difference in runtime continues to grow exponentially, and becomes several orders of magnitude. For datacenter networks (right), the difference is even larger. The latter can be explained by the fact that datacenters provide a higher path diversity and multiple shortest paths between source and target nodes and hence more opportunities for a clever skipping of "uninteresting instances". As the pessimistic problems we aim to solve are co-NP-hard, there are necessarily some hard instances also for our strategic search; this is demonstrated by the S-shaped curve showing a significantly increased runtime for the most difficult instances.

We next discuss the optimistic scenarios, including the experiments both for splittable and nonsplittable cases. Figure 7 shows a cactus plot for the wide-area

Fig. 6: Pessimistic scenario. Left: wide-area networks, right: datacenter networks

Fig. 7: Optimistic scenario. Left: wide-area networks, right: datacenter networks

network setting (on the left) and for the datacenter setting (on the right). Again, our strategic algorithm significantly outperforms the baseline in both scenarios. Interestingly, in the optimistic scenario, the relative performance benefit is larger for wide-area networks as the optimistic strategic search explores all the maximum failure scenarios and there are significantly more of such scenarios in the highly connected datacenter topologies. Hence, while for datacenters (right) the strategic search maintains about one order of magnitude better performance, the performance for the wide-area networks improves exponentially.

### **6 Conclusion**

We presented a comprehensive study of the algorithmic complexity of verifying feasible routes under failures without violating capacity constraints, covering both optimistic and pessimistic, as well as splittable and nonsplittable scenarios. We further presented algorithms, based on strategic failure scenario enumerations, which we proved efficient in realistic scenarios. While our paper charts the complete landscape, there remain several interesting avenues for future research like further scalability improvements and a parallelization of the algorithm.

Acknowledgements. Research supported by the Vienna Science and Technology Fund (WWTF) project ICT19-045 and by the DFF project QASNET.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Network Traffic Classification by Program Synthesis**

Lei Shi1-, Yahui Li2, Boon Thau Loo1, and Rajeev Alur<sup>1</sup>

<sup>1</sup> University of Pennsylvania, Philadelphia PA 19104, USA {shilei,boonloo,alur}@seas.upenn.edu <sup>2</sup> Tsinghua University, Beijing, China li-yh15@mails.tsinghua.edu.cn

**Abstract.** Writing classification rules to identify interesting network traffic is a time-consuming and error-prone task. Learning-based classification systems automatically extract such rules from positive and negative traffic examples. However, due to limitations in the representation of network traffic and the learning strategy, these systems lack both expressiveness to cover a range of applications and interpretability in fully describing the traffic's structure at the session layer. This paper presents Sharingan system, which uses program synthesis techniques to generate network classification programs at the session layer. Sharingan accepts raw network traces as inputs and reports potential patterns of the target traffic in NetQRE, a domain specific language designed for specifying session-layer quantitative properties. We develop a range of novel optimizations that reduce the synthesis time for large and complex tasks to a matter of minutes. Our experiments show that Sharingan is able to correctly identify patterns from a diverse set of network traces and generates explainable outputs, while achieving accuracy comparable to state-of-the-art learning-based systems.

**Keywords:** Program synthesis · Network traffic analysis · Supervised learning.

### **1 Introduction**

Network monitoring systems are essential for network infrastructure management. These systems require classification of network traffic at their core. Today, network operators and equipment vendors write classification programs or patterns upfront in order to differentiate target flows such as attacks or undesired application traffic from normal ones. The process of writing these classification programs often requires deep operator insights, can be error prone, and is not easy to extend to handle new scenarios.

There have been a number of recent attempts at automated generation of classifiers for malicious traffic using machine learning[16,38,5,12] and data mining[6,28,34,39,19] techniques. These classifiers have not gained much traction in production systems, in part due to unavoidable false positive reports and the

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 430–448, 2021.

https://doi.org/10.1007/978-3-030-72016-2\_23

gap between the learning output and explainable operational insights[31]. The challenges call for a more expressive, interpretable and maintainable learningbased classification system.

To be specific, such challenges first come from the extra difficulties learningbased systems face in network applications compared to traditional use cases such as recommendation systems, spam mail filtering or OCR [31]. Misclassifications in network systems have tangible cost such as the need for operators to manually verify potential false reports. Due to the diverse nature and large data volumes of networks in production environments, entirely avoiding these costly mistakes by one training stage is unlikely. Therefore explainability and maintainability plays a core role in a usable learning system.

Properly representing network traffic and learnt patterns is another major difficulty. As a data point for classification purposes, a network trace is a sequence of packets of varying lengths listed in increasing timestamp order. Existing approaches frequently compress it into a regular expression or a feature vector for input. Such compression will eliminate session-layer details and intermediate states in network protocols, making it hard to learn application-layer protocols or multi-stage transactions. These representations also require laborious task-specific feature engineering to get effective learning results, which undermines the systems' advantages of automation. It can also be hard to interpret the learning results to understand the intent and structure of the traffic, due to the blackbox model of many machine-learning approaches and the lack of expressiveness in the inputs and outputs to these learning systems.

To address the above limitations, we introduce Sharingan, which uses program synthesis techniques to auto-generate network classification programs from labeled examples of network traffic traces. Sharingan aims to bridge the gap between learning systems and operator insights, by identifying properties of the traffic that can help inform the network operators on its nature, and provide a basis for automated generation of the classification rules. Sharingan does not aim to outperform state-of-the-art learning systems in accuracy, but rather match their accuracy, while generating output that is more explainable and easier to maintain.

To achieve these goals, we adopt techniques from syntax guided program synthesis [1] to generate a NetQRE [37] program that distinguishes the positive and negative examples. NetQRE, which stands for Network Quantitative Regular Expressions, enables quantitative queries for network traffic, based on flow-level regular pattern matching. Given an input network trace, a NetQRE program generates a numerical value that quantifies the matching of the trace with the described pattern. The classification is done by comparing the synthesized program's output for each example with a learnt threshold T. Positive examples fall above T. The synthesized NetQRE program serves the role of network classifier, identifying flows which match the program specifications.

Sharingan has the following key advantages over prior approaches, which either rely on keyword and regular expression generation [6,28,34,39,19] or statistical traffic analysis [16,38,5,12].

**Requires minimal feature engineering:** NetQRE [37] is an expressive language, and allows succinct description of a wide range of tasks ranging from detecting security attacks to enforcing application-layer network management policies. Sharingan can synthesize any network task on raw traffic expressible as a NetQRE program, without any additional feature engineering. This is an improvement over systems based on manually extracted feature vectors. Also, one outstanding feature of search-based program synthesis is that the only a priori knowledge it needs is information about the language itself. No task-specific heuristics are required.

**Efficient implementation:** The NetQRE program synthesized by Sharingan can be compiled, as has been shown in prior work [37], to efficient low-level implementations that can be integrated into routers and other network devices. On the other hand, traditional statistical classifiers are not directly usable or executable in network filtering systems.

**Easy to decipher and edit:** Finally, Sharingan generates NetQRE programs that can be read and edited. Since they are generic executable programs with high expressiveness, the patterns in the program reveal the stateful protocol structure that is used for the classification, which blackbox statistical models, packet-level regular expressions and feature vectors have difficulty describing. The programs are also amenable to calibration by a network operator, for example, to mix in local policies or debug.

The key technical challenge in design and implementation of Sharingan is the computationally demanding problem of finding a NetQRE expression that is able to separate positive network traffic examples from the negative ones. This search problem is an instance of the syntax-guided synthesis. While this problem has received a lot of attention in recent years, no existing tools and techniques can solve the instances of interest in our context due to the unique semantics of NetQRE programs, the complexity of the expressions to be synthesized and the scale of the data set of network traffic examples used in training. To address this challenge, we devised two novel techniques for optimizing the search – partial execution and merge search, which effectively achieve orders of magnitude reduction in synthesis time. We summarize our key contributions:

**Synthesis-based classification architecture.** We propose the methodology of reducing a network traffic classification problem to a synthesis from examples instance.

**Efficient synthesis algorithm** We devise two efficient algorithms: partial execution and merge search, which efficiently explore the program space and enable learning from very large data sets. Independent of our network traffic classification use cases, these algorithms advance the state-of-the-art in program synthesis.

**Implementation and evaluation.** We have implemented Sharingan and evaluated it for a rich set of metrics using the CICIDS2017 [25,7] intrusion detection benchmark database. Sharingan is able to synthesize a large range of network classification programs in a matter of minutes with accuracy comparable to state-of-the-art systems. Moreover, the generated NetQRE program is easy to interpret, tune, and can be compiled into configurations usable by existing network monitoring systems.

### **2 Overview**

Sharingan's workflow is largely similar to a statistical supervised learning system, although the underlying mechanism is different. Sharingan takes labeled positive and negative network traces as input and outputs a classifier that can classify any new incoming trace. To preserve most of the information from input data and minimize the need for feature engineering, Sharingan considers three kinds of properties in a network trace: (1) all available packet-level header fields, (2) position information of each packet within the sequence, and (3) time information associated with each packet.

Specifically, Sharingan represents a network trace as a stream of feature vectors: S = v0, v1, v2,.... Each vector represents a packet. Vectors are listed in timestamp order. Contents of the vector are parsed field values of that packet. For example, we can define

v[0] = ip.src, v[1] = tcp.sport, v[2] = ip.dst,....

Depending on the information available, different sets of fields can be used to represent a packet. By default, we extract all header fields at the TCP/IP level. To make use of the timestamp information, we also append time interval since the previous packet in the same flow to a packet's feature vector. Feature selection is not necessary for Sharingan.

The output classifier is a NetQRE program p that takes in a stream of feature vectors. Instead of giving a probability score that the data point is positive, it outputs an integer that quantifies the matching of the stream and the pattern. The program includes a learnt threshold T. Sharingan aims to ensure that p's outputs for positive and negative traces fall on different sides of the threshold T. Comparing p's output for a data point with T generates a label. It is possible to translate p and T into executable rules using a compilation step.

Given the above usage model, a network operator can use Sharingan to generate a NetQRE program trained to distinguish normal and suspected abnormal traffic generated from unsupervised learning systems. The synthesized programs themselves, as we will later show, form the basis for deciphering each unknown trace. Consequently, traces whose patterns look interesting can be subjected to a detailed manual analysis by the network operator. Moreover, the generated NetQRE programs can be further refined and compiled into filtering system's rules.

### **3 Background on NetQRE**

NetQRE [37] is a high-level declarative language for querying network traffic. Streams of tokenized packets are matched against regular expressions and aggregated by multiple types of quantitative aggregators. The NetQRE language is defined by the BNF grammar in Listing 1.1.

```
<classifier >::= <program> > <value>
<program > ::= <group -by>
<group -by> ::= (<group -by>)<op>|<
    feats>
             | <qre>
<qre> ::= (<qre> <qre>)<op>
           | (<qre>)*<op>
           | <unit>
<unit> ::= /<re>/
<re> ::= <re> <re> | (<re>)*
         | <pred>
                                                  | _
                                         <pred> ::= <pred> && <pred>
                                                     | <pred> || <pred>
                                                     | [<feat> == <value >]
                                                     | [<feat> >= <value >]
                                                     | [<feat> <= <value >]
                                                     | [<feat> -> <prefix >]
                                         <feats> ::= <feat>
                                                      | <feats>, <feat>
                                         <feat> ::= 0 | 1 | 2 | ......
                                         <op> ::= max | min | sum
                                               Listing 1.1: NetQRE Grammar
```
As an example, if we want to find out if any single source is sending more than 100 TCP packets, the following classifier based on a NetQRE program describes the desired classifier:

```
( ( / [ip.type = TCP] / )*sum )max|ip.src_ip > 100
```
At the top level, there are two parts of the classifier. A processing program on the left that maps a network trace to an output number, and a threshold against which this value is compared on the right. They together form the classifier. Inputs fall into different classes based on the results of the comparison.

Group-by expression (<group-by>) splits the trace into sub-flows based on the value of the specified field (source IP address in this example):

( ............ )max|ip.src\_ip

Packets sharing the same value in the field will be assigned to the same sub-flow. Sub-flows are processed individually, and the outputs of which are aggregated according to the aggregation operator (<op>) (maximum in this example).

In each sub-flow, we want to count the number of TCP packets. This can be broken down into three operations: (1) specifying a pattern that a single packet is a TCP packet, (2) specifying that this pattern repeats arbitrary number of times, and (3) adding 1 to a counter each time this pattern is matched.

(1) is achieved by a plain regular expression involving predicates. A predicate describes properties of a packet that can match or mismatch one packet in the trace. Four types of properties frequently used in networks can be described:


Predicates combined by concatenation and Kleene-star form a plain regular expression, which matches a network trace considered as a string of packets.

```
A unit expression indicates that a plain regular expression should be viewed
as atomic for quantitative aggregation (in this case a single TCP packet):
```

```
/ [ip.type = TCP] /
```
It either matches a substring of the trace and outputs the value 1, or does not match.

To achieve (2) and (3), we need a construct to both connect the regular patterns to match the entire flow and also aggregate outputs bottom up from units at the same time. We call it quantitative regular expression (<qre>). In this example, we use the iteration operator:

( / [ip.type = TCP] / )\*sum

It matches exactly like the Kleene-star operator, and at the same time, for each repetition of the sub-pattern, the sub-expression's output is aggregated by the aggregation operator. In this case, the sum is taken, which acts as a counter for the number of TCP packets. The aggregation result for this expression will in turn be returned as an output for higher-level aggregations.

The language also supports the concatenation operator:

(<qre> <qre>)<op>

which works analogous to concatenation for regular matching. It aggregates the quantity by applying the <op> on the outputs of two sub-expressions that match the prefix and suffix.

In addition to this core language, there is a specialization for the synthesis purpose. We observe that comparing a field with values that do not appear in any of the given examples is expensive but will not produce any meaningful information. Therefore we use the relative position in the examples' value space instead of a specific value, for example, 50% instead of 3 in value space {1, <sup>3</sup>, <sup>12</sup>, <sup>15</sup>}.

### **4 Synthesis Algorithm**

Given a set of positive and negative examples E<sup>p</sup> and En, respectively, the goal of our synthesis algorithm is to derive a NetQRE program p<sup>f</sup> and a threshold T that differentiates E<sup>p</sup> apart from En. We start with notations to be used in this section:

**Notation.** p and q denote individual programs, and P and Q denote sets of programs. <sup>p</sup><sup>1</sup> <sup>→</sup> <sup>p</sup><sup>2</sup> denotes it is possible to mutate <sup>p</sup><sup>1</sup> following production rules in NetQRE's grammar to get <sup>p</sup>2. The relation <sup>→</sup> is transitive. We assume the starting symbol is always <program>.

p(x) denotes program p's output on input x, where x is a sequence of packets and p(x) is a numerical value. If p is an incomplete program, i.e., if p contains some non-terminals, then <sup>p</sup>(x) = {q(x)<sup>|</sup> <sup>p</sup> <sup>→</sup> <sup>q</sup>} is a set of numerical values, containing x's output through all possible programs p can mutate into. We define p(x).max to be the maximum value in this set. Similarly, p(x).min is the minimum value.

The synthesis goal can be formally defined as: <sup>∀</sup><sup>e</sup> <sup>∈</sup> <sup>E</sup>p, p<sup>f</sup> (e) > T and <sup>∀</sup><sup>e</sup> <sup>∈</sup> <sup>E</sup>n, p<sup>f</sup> (e) < T.

#### **4.1 Overview**

Our design needs to address two key challenges. First, NetQRE's rich grammar allows a large possible program space and many possible thresholds for search. Second, the need to check each possible program against a large data set collected from network monitoring tasks poses scalability challenge to the synthesis.

Fig. 1: Synthesizer Overview

We propose two techniques for addressing these challenges: partial execution (Section 4.2) and merge search (Section 4.3). Figure 1 shows an overview of the synthesizer.

The top-level component is the search planner, that assigns search tasks over subsets of the entire training data to the enumerator in a divide-and-conquer manner. Each such task is a search-based synthesis instance, where the enumerator enumerates all possible programs starting from s0, expanded using the productions in NetQRE grammar, until one that can distinguish the assigned subset of E<sup>p</sup> and E<sup>n</sup> is found.

The enumerator optimizes for the first challenge by querying the distributed oracle about each partial program's feasibility and doing pruning early. The oracle evaluates partial programs using partial execution. The search planner optimizes for the second challenge by merging search results from subsets of the large training data, so as to save unnecessary checking, which we call the merge search strategy.

We next explain each technique in detail in the rest of this section.

#### **4.2 Partial Execution**

A partial program is an incomplete program with non-terminals. Similar to prior work making overestimation on regular expressions and imperative languages for early pruning in the search process [14,29,30], we want to evaluate a partial NetQRE program for the feasibility of all possible completions of it, so as to decide early if any of them can serve as a proper classifier for E<sup>p</sup> and En.

This process includes three main steps: (1) finding an equivalent completion pˆ of a partial program p so that evaluating ˆp on any input x is equivalent to evaluating the combination of all possible completions of p on x, (2) efficiently evaluating ˆp(x), (3) deciding whether to discard p based on the evaluation result.

**Equivalent Completion:** Recall that we define p(x) of a partial program p to be the union of all <sup>q</sup>(x) such that <sup>p</sup> <sup>→</sup> <sup>q</sup>. Since we mainly care about outputs of positive and negative examples on different sides of a threshold, the essential information is the upper and lower bounds for p(x). Therefore, the criterion for finding an equivalent completion is the bounds of ˆp(x) should include p(x) for any input x.

Many non-terminals have a straightforward equivalent completion. We replace (1) any uncertain numerical value with the largest or smallest possible value depending on the context, (2) any unknown predicate with unknown, (3) any unknown regular expression with \_\* and (4) any unknown quantitative regular expression with (/\_ \_\*/)\*sum. We skip the formal proof of correctness of this approach. Intuitively, the first two include all possible values at the position, and the latter two include all possible matching and aggregation strategies for a trace.

There are some non-terminals that do not have an equivalent completion, such as <group-by> and <op>. While doing enumeration, we put a complexity penalty over these non-terminals if they are not expanded, therefore encouraging earlier expansion of them so that partial execution is possible.

**Computing Ambiguity:** Notice that regular patterns naturally allow multiple matching strategies if a character(packet) in the input can match more than one predicate in the program, which is why we can estimate a set of NetQRE programs by one equivalent completion ˆp. The goal and also the major challenge in evaluating ˆp(x) on arbitrary input x is to compute the quantitative outputs from all valid matching strategies, which can grow exponentially with the input trace's length.

Fig. 2: Illustration of an unambiguous program. Predicate A matches packet C's while predicate B matches packet D.

Fig. 3: Illustration of the first 3 steps of strategy one when predicate B is not yet explored.

To solve the problem of too many matching strategies, we use an approximation: merging "close" matching strategies. Two strategies are defined to be "close" if at some step of their matching process (1) they have matched the same number of packets in the trace and (2) the last predicate they have matched is exactly the same. We explore all matching strategies simultaneously and do a merging whenever two strategies can be identified to be close. Notice that each matching strategy maintains a distinct copy of aggregation states for every <qre> expression. States for a same expression as well as the final results are merged into one interval.

As an example, Figure 2,3,4,5 illustrates the evaluation process of a partial program during the search for the following pattern with CCCCD as input:

( ( /AA/ )\*sum ( /B/ )\*sum )max

A A \_ 5 iter iter concat max = [0,1] max = [3,5] sum +[1,1] sum = [1,3] sum = [3,5] sum +[1,1] -4

Fig. 4: Illustration of the first 3 steps of strategy two

Fig. 5: Illustration of the last 2 steps of merged strategy one & two

By the properties of interval arithmetic and regular expressions, it can be proven that the approximation result strictly contains the true output range. Or more formally, ˆp(x).min <sup>≤</sup> <sup>p</sup>(x).min <sup>≤</sup> <sup>p</sup>(x).max <sup>≤</sup> <sup>p</sup>ˆ(x).max.

Intuitively, the proposed evaluation scheme works well because we only care about the boundary of outputs, which are represented by intervals as the abstract data type. We implement the execution and approximation process by the Data Transducer model proposed by [2], which consumes a small constant memory and linear time to the input trace's length given a specific program.

**Make Decision:** To make a decision regarding a partial program p, let q be a complete program and assume there is only one pair of examples e<sup>p</sup> and en. For q to accept e<sup>p</sup> and en, there must be a threshold T such that q(en).max < T <q(ep).min. Therefore, given a pair of examples e<sup>p</sup> and en, a program q is correct if and only if q(en).max < q(ep).min. When this holds, any value between q(en).max and q(ep).min can be used as the threshold.

**Lemma 1**: There exists a correct program <sup>q</sup> such that <sup>p</sup> <sup>→</sup> <sup>q</sup> only if ˆp(en).min < pˆ(ep).max

**Lemma 2**: If ˆp(en).max < <sup>p</sup>ˆ(ep).min then any program <sup>q</sup> such that <sup>p</sup> <sup>→</sup> <sup>q</sup> is correct.

From Lemma 1, we can decide if p must be rejected. From Lemma 2, we can decide if p must be accepted. These criteria can be extended to more than 1 pair of examples. We will not give formal proof to the lemmas. Figures 6 and 7 show two intuitive examples for explanations of the decision making process. (but do not necessarily represent properties of real data sets). Each vertical bar represents the output range of the corresponding data point produced by the program under investigation.

Fig. 6: A correct program found. No negative output can ever be greater than any positive output. 5.5 can be used as a threshold

Fig. 7: A bad program. pos 1 can never be greater than neg 3.

#### **4.3 Merge Search**

In the rest of this subsection, we describe three heuristics for scaling up synthesis to large data sets, namely divide and conquer, simulated annealing, and parallel processing. We call the combination of these the merge search technique.

**Divide and Conquer.** Enumerating and verifying programs on large data sets is expensive. Our core strategy to improve performance is to learn patterns on small subsets and merge them into a global pattern with low overhead.

It is based on two observations: First, the pattern of the entire data set is usually shaped by a few extreme data points. Looking at these extreme data points locally is enough to figure out critical properties of the global pattern. Second, conflicts in local patterns are mostly describing different aspects of a same target rather than fundamental differences, thus can be resolved by simple merge operations such as disjunction, truncation or concatenation.

This divide and conquer strategy is captured in the following algorithm:

```
def d&c(dataset)
    if dataset.size > threshold
        subsetL,subsetR = split(dataset)
        candidateL = d&c(subsetL)
        candidateR = d&c(subsetR)
        return merge(dataset, candidateL, candidateR)
    else
        return synthesize(dataset, s0)
```
The "split" step corresponds to evenly splitting positive and negative examples. Then sub-patterns are synthesized on smaller subsets. The conquer, or "merge" step requires synthesizing the pattern again on the combined dataset. But sub-patterns are reused in two ways to speedup this search.

First, if we see a sub-pattern as an AST, then its low-level sub-trees up to certain depth threshold are added to the syntax as a new production option for the corresponding non-terminal at the sub-tree's root. They can then serve as shortcuts for likely building blocks. Second, the sub-patterns' skeletons left after removing these sub-trees are used as seeds for higher-level searches, which serve as shortcuts for likely overall structures. Both are given complexity rewards to encourage the reuse.

In practice, many search results can be directly reused from cached results generated from previous tasks on similar subsets. This optimization can further reduce the synthesis time.

**Simulated Annealing** When searching for local patterns at lower levels, we require the Enumerator to find not 1 but t candidate patterns for each subset. Such searches are fast for smaller data sets and can cover a wider range of possible patterns. As the search goes to higher levels for larger data sets, we discard the least accurate local patterns and also reduce t. The search will focus on refining the currently optimal global pattern. This idea is based on traditional simulated annealing algorithms and helps to improve the synthesizer's performance in many cases.

**Parallelization.** Most steps in the synthesis process are inherently parallelizable. They include (1) doing synthesis on different subsets of data, (2) exploring different programs in the enumeration, (3) verifying different programs found so far, (4) executing a program on different data points during the verification.

We focus less on optimizing (1) and (2) since they are not the performance bottlenecks. We instead focus on parallelizing (3) and (4) over multiple cores. In our implementation, using 5 machines with 32 cores each, we devote one thread each to run task (1) and (2) on one machine, 64 threads on the same machine to run task (3), and 512 threads distributed over the remaining four machines to run task (4). The distributed version is approximately two orders of magnitude faster than the single-threaded version for complex tasks. Given more computing power, a proportional speedup can be expected.

### **5 Evaluation**

We implemented Sharingan in 10K lines of C++ code. Our experiments are carried out in a cluster of five machines directly connected by Ethernet cables, each with 32 Intel(R) Xeon(R) E5-2450 CPUs. The frequency for each core is 2.10GHz. Arrangements of tasks are explained in the last part of Sec 4.3. We will evaluate the minimal feature engineering(5.1), accuracy(5.2), interpretability and editability(5.3), efficient implementation(5.4), and synthesis algorithm efficiency(5.5) aspects of Sharingan in order.

### **5.1 Data Preparation**

We utilize eight types of attacks from the CICIDS2017 database[25,7], a public repository of benign and attack traffic used for evaluating intrusion detection systems. They cover a wide range of attack traffic including botnets, Denial of service (DoS), port scanning, and password cracking.

The data is labelled per flow by an attack type or "Benign". We learn each type of attack against benign traffic separately. To use as much data as possible, for each attack type, we use 1500 positive (attack) flows and 10000 negative (benign) flows for training, and another distinct data set of similar size for testing.

The main benefit of Sharingan in this step is the minimal need for feature engineering. We simply use all header fields of TCP and IP, and the inter-packet arrival time between adjacent packets in the same flow as features. In total, there are 19 features per packet and <sup>N</sup> <sup>×</sup> 19 features per trace of length <sup>N</sup>.

In contrast, other state-of-the-art systems rely on a carefully designed feature extraction step to work well. For example, the feature vectors included in CICIDS2017 database contain 84 features extracted by the CICFlowMeter [9,13] tool for each flow, characterizing performance metrics of the entire flow such as duration, mean forward packet length, min activation time, etc. Kitsune [16] extracts bandwidth information over the past short periods as packet-level features. DECANTeR [6] uses HTTP-level properties such as constant header fields, language, amount of outgoing information, etc. as flow-level features.

#### **5.2 Learning Accuracy**

We next validate Sharingan's learning accuracy using the following evaluation methodology. For each individual attack type, we use the training data (attack and normal traffic) as input to Sharingan to learn a NetQRE program. The NetQRE program is then validated on the corresponding testing set for accuracy. The output of Sharingan includes a NetQRE program that maps a network trace to an integer output and a recommended range for the threshold. By modifying the threshold, true positive rate (TP) and false positive rate (FP) can be adjusted, as we will later explain in Section 5.3. We use AUC (Area under Curve) - ROC (Receiver Operating Characteristics) metric, which is a standard statistical measure of classification performance.

Fig. 8: Sharingan's true positive rate under low false positive rate, AUC-ROC and learning rate for 8 attacks in CICIDS2017 (higher is better)

Figure 8 contains results for eight types of attacks. Apart from AUC-ROC values, we also show the true positive rates when false positive rate is adjusted to 3 different levels: 0.001, 0.01, and 0.03. Given that noise is common in most network traffic, the last metric shown in Figure 8 is the highest achievable learning rate.

Overall, we observe that Sharingan performs well across a range of attacks with accuracy numbers on par with prior state-of-the-art systems such as Kitsune, which has an average AUC-ROC value of 0.924 on nine types of IoT-based attacks, and DECANTeR, which has an average detection rate of 97.7% and a false positive rate of 0.9% on HTTP-based malware. In six out of eight attacks, Sharingan achieves above 0.994 of AUC-ROC and 100% of true positive rate at 1% false positive rate. The major exception is Botnet ARES, which consists of a mix of malicious attack vectors. Handling such multi-vector attacks is an avenue for our future work.

#### **5.3 Post-processing and Interpretation**

One of the benefits of Sharingan is that it generates an actual classification program that can be further adapted and tuned by a network operator. The program itself is also close to the stateful nature of session-layer protocols and attacks, and thus is readable and provides a basis for the operator to understand the attack cause. We briefly illustrate these capabilities in this section.

**FP-TP Tradeoff** Network operators need to occasionally tune a classifier's sensitivity to false positives and true positives. Sharingan generates a NetQRE program with a threshold T. This threshold can be adjusted to vary the false

Fig. 11: ROC Curve, logarithmic scale(DoS Hulk)

Fig. 10: Output distribution of test set(DoS Hulk)

positive and true positive rate. Figures 9 and 10 show the output distribution from positive and negative examples in the DoS Hulk attack. A denotes the largest negative output and B denotes the smallest positive output. When A > B, there is some unavoidable error. We can slide the threshold T from B to A and obtain an ROC curve for the test data, as illustrated in Figure 11.

**Interpretation** We describe a learnt NetQRE program to demonstrate how a network operator can interpret the classifiers. <sup>3</sup> The NetQRE program synthesized by Sharingan for DDoS task above is:

```
( ( /_* A _* B _*/ )*sum /_* C _*/ )sum > 4
Where
A = [ip.src_ip ->[0%,50%]] B = [tcp.rst==1]
C = [time_since_last_pkt <=50%]
```
DDoS is a flood attack from a botnet of machines to exhaust memory resources on the victim server. The detected pattern consists of packets that start with source IP in a certain range, followed by a packet with the reset bit set to 1, and then a packet with a short time interval from its predecessor. Finally, the program considers the flow a match if the patterns show up with a total count of over 4.

The range of source IP addresses specified in the pattern possibly contains botnet IP addresses. Attack flows are often reset when the load cannot be handled or the flows' states cannot be recognized, which indicates the attack is successfully launched. Packets with short intervals further support a flood attack. Unique properties of DDoS attack are indeed captured by this program!

**Refinement by Human Knowledge** Finally, an advantage of generating a program for classification is that it enables the operator to augment the generated NetQRE program with domain knowledge before deployment. For example, in the DDoS case, if they know that the victim service is purely based on TCP, they can append [ip.type = TCP] to all predicates. Alternatively, if they know that the victim service is designed for 1000 requests per second, they can explicitly replace the arrival time interval with 1ms. The modified program then is:

```
( ( /_* A _* B _*/ )*sum /_* C _*/ )sum > 4
Where
```
<sup>3</sup> A full list of learnt NetQRE programs can be found in our tech report https: //arxiv.org/abs/2010.06135.

```
A = [ip.type = TCP]&&[ip.src_ip ->[0%,50%]]
B = [ip.type = TCP]&&[tcp.rst==1]
C = [ip.type = TCP]&&[time_since_last_pkt <=1ms]
```
#### **5.4 Deployment Scenarios**

We now describe three ways for network operators to deploy the output of Sharingan: (1) taking action hinted by the interpretation; (2) directly executing the NetQRE program as a monitoring system; and (3) translating the NetQRE program to rules in other monitoring systems.

Revisiting the DDoS example in Section 5.3, in the first case, the operator may refine the source IP part to find out the accurate range of attacker machines and block them.

If the NetQRE program itself is to be used as a monitoring system, its runtime system can be directly deployed on any general purpose machine. Prior work [37] has shown that NetQRE generates performance that is comparable to optimized low-level implementations. Moreover, these programs can be easily compiled into other formats acceptable to existing monitoring systems.

#### **5.5 Program Synthesis Performance**

**Synthesis time:** In our final experiment, the performance of Sharingan is measured, in terms of time needed for program synthesis.

Figure 12 shows the program complexity (Y-axis) and synthesis (learning) time (in minutes). Not surprisingly, complex programs require more time to synthesize. We further observe that Sharingan is able to synthesize complex programs with at least 20-30 terms, mostly within minutes to an hour, which is practical for many real-world use cases and can be further reduced through parallelism over more machines. As a comparison, Kitsune reports training times between 8 minutes and 52 minutes on individual attacks [16], and DECANTeR reports training times between 5 hours and 10 hours on individual users' data [6].

Fig. 12: Time-complexity relation

Fig. 13: Impact of optimizations on synthesis performance

**Effectiveness of Optimizations.** We explore the effectiveness of the individual optimization strategies described in Section 4. In Figure 13, we compare

the synthesis time and the number of programs searched for a fully optimized Sharingan against results from disabling each optimization. SSH Patator is used as the demonstrating example since it is moderately complex.

We observe that disabling partial execution optimization makes both metrics significantly worse. Being able to prune early can indeed greatly reduce time wasted on unnecessary exploration and checking. By disabling merge search, although the number of programs searched decreases, the total synthesis time increases given the overhead of having to check each program against the entire data set. The synthesis cannot finish within reasonable time if both are disabled.

In summary, all optimization strategies are effective to speed up the synthesis process. A synthesis task that is otherwise impossible to finish within practical time can now be done in less than 15 minutes.

### **6 Related Work**

**Auto-Generation of Network Configurations.** Broadly speaking, network traffic classification rule is a type of network configuration. There are other lines of research that aim at the automatic generation of different categories of network configurations. EasyACL [15] aims at synthesis of access control lists(ACL) from natural language descriptions. NetGen [24], NetComplete [10] and Genesis [32] synthesize data plane routing configurations based on SMT solvers given policy specifications. NetEgg [36] instead takes examples provided by user to generate routing configurations in an interactive way. Sharingan focuses on network traffic classification and has a different target from them.

**Other Learning-based Systems.** Apart from competing systems we explicitly compared to above, there are other learning-based systems under different settings from Sharingan.

Unsupervised learning systems are useful for recognizing outliers and other types of "abnormal" flows [17,38,35], most notably in intrusion detection systems. Its ability to differentiate unknown types of traffic from the known cannot be replaced by Sharingan. Sharingan can augment unsupervised learning systems by reducing the effort required for analyzing the reported traces.

Learning systems using state machine[18] or regular expressions for payload strings[34] as models both share the advantage of requiring minimal feature engineering. The former generates less succinct models compared to Sharingan and is typically used for verification of network protocols. The latter learns patterns at individual packet level rather than session level.

There are state-of-the-art point solutions focusing on specific scenarios rather than general-purpose network traffic classification. For example, PrivateEye focuses on detecting privacy breaches in the cloud[4]. RFDIDS solves intrusion detection challenges unique to power systems[26].

**Syntax-Guided Synthesis.** Sharingan builds on a large body of work on syntax-guided synthesis [11,21,23,20,22,29,27]. However, synthesis techniques proposed in this paper go beyond the state of the art, and have the potential to be applied to other applications of program synthesis.

Partial execution share similarity to the overestimation idea in [14] (see also follow-ups [29,30,33]), where the system learns plain regular expressions and overestimates the feasibility of a non-terminal with a Kleene-star. But no prior work proposed an overestimation algorithm for quantitative stream query languages similar to NetQRE. Nor do they consider the specification format for a classifier program with unknown numerical thresholds.

[3] proposed a divide-and-conquer strategy similar to merge search for optimizing program synthesis. It is focused on standard SyGuS tasks based on logical constraints and uses decision tree to combine sub-patterns instead of trying to merge them into one compact program. Merge search proposed in this work is not specific to Sharingan, and can be used in other synthesis tasks to allow the handling of large data sets.

Finally, there is no prior work that solely uses program synthesis to perform accurate real-world large-scale classification. The closest work concerns simple low-accuracy programs synthesized as weak learners [8], and requires a separate SVM to assemble them into a classifier.

### **7 Conclusion**

This paper presents Sharingan, which develops syntax-guided synthesis techniques to automatically generate NetQRE programs for classifying session-layer network traffic. Sharingan can be used for generating network monitoring queries or signatures for intrusion detection systems from labeled traces. Our results demonstrate three key value propositions for Sharingan, namely minimal feature engineering, efficient implementation, and interpretability as well as editability. While achieving accuracy comparable to state-of-the-art statistical and signature-based learning systems, Sharingan is significantly more usable and requires synthesis time practical for real-world tasks. <sup>4</sup>

### **Acknowledgements**

We thank the anonymous reviewers for their feedback. This research was supported in part by NSF grant CCF 1763514, CNS 1513679, and Accountable Protocol Customization under the ONR TPCP program with grant number N00014-18-1-2618.

### **References**

1. Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo MK Martin, Mukund Raghothaman, Sanjit A Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. Syntax-guided synthesis. In *2013 Formal Methods in Computer-Aided Design*, pages 1–8. IEEE, 2013.

<sup>4</sup> Sharingan's code is publicly available at https://github.com/SleepyToDeath/ NetQRE.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **General Decidability Results for Asynchronous Shared-Memory Programs: Higher-Order and Beyond** *-*

Rupak Majumdar , Ramanathan S. Thinniyam -, and Georg Zetzsche

Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern, Germany {rupak,thinniyam,georg}@mpi-sws.org

**Abstract.** The model of asynchronous programming arises in many contexts, from low-level systems software to high-level web programming. We take a language-theoretic perspective and show general decidability and undecidability results for asynchronous programs that capture all known results as well as show decidability of new and important classes. As a main consequence, we show decidability of safety, termination and boundedness verification for *higher-order* asynchronous programs—such as OCaml programs using Lwt—and undecidability of liveness verification already for order-2 asynchronous programs. We show that under mild assumptions, surprisingly, safety and termination verification of asynchronous programs with handlers from a language class are decidable *iff* emptiness is decidable for the underlying language class. Moreover, we show that configuration reachability and liveness (fair termination) verification are equivalent, and decidability of these problems implies decidability of the well-known "equal-letters" problem on languages. Our results close the decidability frontier for asynchronous programs.

**Keywords:** Higher-order asynchronous programs · Decidability

### **1 Introduction**

Asynchronous programming is a common way to manage concurrent requests in a system. In this style of programming, rather than waiting for a time-consuming operation to complete, the programmer can make asynchronous procedure calls which are stored in a task buffer pending later execution. Each asynchronous procedure, or handler, is a sequential program. When run, it can change the global shared state of the program, make internal synchronous procedure calls, and post further instances of handlers to the task buffer. A scheduler repeatedly and non-deterministically picks pending handler instances from the task buffer and executes their code atomically to completion. Asynchronous programs appear in many domains, such as operating system kernel code, web programming,

This research was sponsored in part by the Deutsche Forschungsgemeinschaft project 389792660 TRR 248–CPEC and by the European Research Council under the Grant Agreement 610150 (ERC Synergy Grant ImPACT).

c The Author(s) 2021

J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 449–467, 2021. https://doi.org/10.1007/978-3-030-72016-2 24

or user applications on mobile platforms. This style of programming is supported natively or through libraries for most programming environments. The interleaving of different handlers hides latencies of long-running operations: the program can process a different handler while waiting for an external operation to finish. However, asynchronous scheduling of tasks introduces non-determinism in the system, making it difficult to reason about correctness.

An asynchronous program is finite-data if all program variables range over finite domains. Finite-data programs are still infinite state transition systems: the task buffer can contain an unbounded number of pending instances and the sequential machine implementing an individual handler can have unboundedly large state (e.g., if the handler is given as a recursive program, the stack can grow unboundedly). Nevertheless, verification problems for finite-data programs have been shown to be decidable for several kinds of handlers [12,30,20,6]. Several algorithmic approaches have been studied, which tailor to (i) the kinds of permitted handler programs and (ii) the properties that are checked.

**State of the art** We briefly survey the existing approaches and what is known about the decidability frontier. The Parikh approach applies to (first-order) recursive handler programs. Here, the decision problems for asynchronous programs are reduced to decision problems over Petri nets [12]. The key insight is that since handlers are executed atomically, the order in which a handler posts tasks to the buffer is irrelevant. Therefore, instead of considering the sequential order of posted tasks along an execution, one can equivalently consider its Parikh image. Thus, when handlers are given pushdown systems, the behaviors of an asynchronous program can be represented by a (polynomial sized) Petri net. Using the Parikh approach, safety (formulated as reachability of a global state), termination (whether all executions terminate), and boundedness (whether there is an a priori upper bound on the task buffer) are all decidable for asynchronous programs with recursive handlers, by reduction to corresponding problems on Petri nets [30,12]. Configuration reachability (reachability of a specific global state and task buffer configuration), fair termination (termination under a fair scheduler), and fair non-starvation (every pending handler instance is eventually executed) are also decidable, by separate ad hoc reductions to Petri net reachability [12]. A "reverse reduction" shows that Petri nets can be simulated by polynomial-sized asynchronous programs (already with finite-data handlers).

In the downclosure approach, one replaces each handler with a finite-data program that is equivalent up to "losing" handlers in the task buffer. Of course, this requires that one can compute equivalent finite-data programs for given handler programs. This has been applied to checking safety for recursive handler programs [3]. Finally, a bespoke rank-based approach has been applied to checking safety when handlers can perform restricted higher-order recursion [6].

**Contribution** Instead of studying individual kinds of handler programs, we consider asynchronous programs in a general language-theoretic framework. The class of handler programs is given as a language class C: An asynchronous program over a language class C is one where each handler defines a language from C over the alphabet of handler names, as well as a transformer over the global state. This view leads to general results: we can obtain simple characterizations of which classes of handler programs permit decidability. For example, we do not need the technical assumptions of computability of equivalent finite-data programs from the Parikh and the downclosure approach.

Our first result shows that, under a mild language-theoretic assumption, safety and termination are decidable if and only if the underlying language class <sup>C</sup> has decidable emptiness problem.<sup>1</sup> Similarly, we show that boundedness is decidable iff finiteness is decidable for the language class C. These results are the best possible: decidability of emptiness (resp., finiteness) is a requirement for safety and termination verification already for verifying the safety or termination (resp., boundedness) of one sequential handler call. As corollaries, we get new decidability results for all these problems for asynchronous programs over higher-order recursion schemes, which form the language-theoretic basis for programming in higher-order functional languages such as OCaml [21,28], as well as other language classes (lossy channel languages, Petri net languages, etc.).

Second, we show that configuration reachability, fair termination, and fair starvation are mutually reducible; thus, decidability of any one of them implies decidability of all of them. We also show decidability of these problems implies the decidability of a well-known combinatorial problem on languages: given a language over the alphabet {a, <sup>b</sup>}, decide if it contains a word with an equal number of as and bs. Viewed contrapositively, we conclude that all these decision problems are undecidable already for asynchronous programs over order-2 pushdown languages, since the equal-letters problem is undecidable for this class.

Together, our results "close" the decidability frontier for asynchronous programs, by demonstrating reducibilities between decision problems heretofore studied separately and connecting decision problems on asynchronous programs with decision problems on the underlying language classes of their handlers.

While our algorithms do not assume that downclosures are effectively computable, we use downclosures to prove their correctness. We show that safety, termination, and boundedness problems are invariant under taking downclosures of runs; this corresponds to taking downclosures of the languages of handlers.

The observation that safety, termination, and boundedness depend only on the downclosure suggests a possible route to implementation. If there is an effective procedure to compute the downclosure for class C, then a direct verification algorithm would replace all handlers by their (regular) downclosures, and invoke existing decision procedures for this case. Thus, we get a direct algorithm based on downclosure constructions for higher order recursion schemes, using the string of celebrated recent results on effectively computing the downclosure of word schemes [33,15,7].

We find our general decidability result for asynchronous programs to be surprising. Already for regular languages, the complexity of safety verification jumps

<sup>1</sup> The "mild language-theoretic assumption" is that the class of languages forms an effective full trio: it is closed under intersections with regular languages, homomorphisms, and inverse homomorphisms. Many language classes studied in formal language theory and verification satisfy these conditions.

from NL (NFA emptiness) to EXPSPACE (Petri net coverability): asynchronous programs are far more expressive than individual handler languages. It is therefore surprising that safety and termination verification remains decidable whenever it is decidable for individual handler languages.

Full proofs of our results are available here [25].

### **2 Preliminaries**

Basic Definitions We assume familiarity with basic definitions of automata theory (see, e.g., [18,31]). The projection of word w onto some alphabet Σ , written Proj<sup>Σ</sup>- (w), is the word obtained by erasing from w each symbol which does not belong to Σ . For a language L, define Proj<sup>Σ</sup>- (L) = {Proj<sup>Σ</sup>- (w) <sup>|</sup> <sup>w</sup> <sup>∈</sup> <sup>L</sup>}. The subword order on <sup>Σ</sup><sup>∗</sup> is defined as <sup>w</sup> <sup>w</sup> for w, w <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> if <sup>w</sup> can be obtained from w by deleting some letters from w . For example, abba bababa but abba baaba. The downclosure <sup>↓</sup><sup>w</sup> with respect to the subword order of a word <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> is defined as <sup>↓</sup><sup>w</sup> := {w <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> <sup>|</sup> <sup>w</sup> <sup>w</sup>}. The downclosure <sup>↓</sup><sup>L</sup> of a language <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> is given by <sup>↓</sup><sup>L</sup> := {w <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> | ∃<sup>w</sup> <sup>∈</sup> <sup>L</sup>: <sup>w</sup> <sup>w</sup>}. Recall that the downclosure <sup>↓</sup><sup>L</sup> of any language <sup>L</sup> is a regular language [17].

<sup>A</sup> multiset **<sup>m</sup>**: <sup>Σ</sup> <sup>→</sup> <sup>N</sup> over <sup>Σ</sup> maps each symbol of <sup>Σ</sup> to a natural number. Let M[Σ] be the set of all multisets over Σ. We treat sets as a special case of multisets where each element is mapped onto 0 or 1. As an example, we write **<sup>m</sup>** <sup>=</sup> <sup>a</sup>, <sup>a</sup>, <sup>c</sup> for the multiset **<sup>m</sup>** <sup>∈</sup> <sup>M</sup>[{a, <sup>b</sup>, <sup>c</sup>, <sup>d</sup>}] such that **<sup>m</sup>**(a) = 2, **m**(b) = **m**(d) = 0, and **m**(c) = 1. We also write |**m**| = <sup>σ</sup>∈<sup>Σ</sup> **<sup>m</sup>**(σ).

Given two multisets **<sup>m</sup>**, **<sup>m</sup>** <sup>∈</sup> <sup>M</sup>[Σ] we define the multiset **<sup>m</sup>** <sup>⊕</sup> **<sup>m</sup>** <sup>∈</sup> <sup>M</sup>[Σ] for which, for all <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>, we have (**<sup>m</sup>** <sup>⊕</sup> **<sup>m</sup>** )(a) = **m**(a) + **m** (a). We also define the natural order ( on <sup>M</sup>[Σ] as follows: **<sup>m</sup>** ( **<sup>m</sup>** iff there exists **<sup>m</sup>**<sup>Δ</sup> <sup>∈</sup> <sup>M</sup>[Σ] such that **<sup>m</sup>**⊕**m**<sup>Δ</sup> <sup>=</sup> **<sup>m</sup>** . We also define **m** 4**m** for **m** ( **m** analogously: for all <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>, we have (**<sup>m</sup>** <sup>4</sup> **<sup>m</sup>** )(a) = **m**(a) − **m** (a). For <sup>Σ</sup> <sup>⊆</sup> <sup>Σ</sup> we regard **<sup>m</sup>** <sup>∈</sup> <sup>M</sup>[Σ] as a multiset of M[Σ ] where undefined values are sent to 0.

Language Classes and Full Trios A language class is a collection of languages, together with some finite representation. Examples are the regular (e.g. represented by finite automata) or the context-free languages (e.g. represented by pushdown automata or PDA). A relatively weak and reasonable assumption on a language class is that it is a full trio, that is, it is closed under each of the following operations: taking intersection with a regular language, taking homomorphic images, and taking inverse homomorphic images. Equivalently, a language class is a full trio iff it is closed under rational transductions [5].

We assume that all full trios C considered in this paper are effective: Given a language <sup>L</sup> from <sup>C</sup>, a regular language <sup>R</sup>, and a homomorphism <sup>h</sup>, we can compute a representation of the languages <sup>L</sup> <sup>∩</sup> <sup>R</sup>, <sup>h</sup>(L), and <sup>h</sup>−<sup>1</sup>(L) in <sup>C</sup>.

Many classes of languages studied in formal language theory form effective full trios. Examples include the regular and the context-free languages [18], the indexed languages [2,10], the languages of higher-order pushdown automata [26], higher-order recursion schemes (HORS) [16,9], Petri nets [14,19], and lossy channel systems (see Section 4.1). (While HORS are usually viewed as representing a tree or collection of trees, one can also view them as representing a word language, as we explain in Section 5.)

Informally, a language class defined by non-deterministic devices with a finitestate control that allows ε-transitions and imposes no restriction between input letter and performed configuration changes (such as non-deterministic pushdown automata) is always a full trio: The three operations above can be realized by simple modifications of the finite-state control. The deterministic context-free languages are a class that is not a full trio.

Asynchronous Programs: A Language-Theoretic View We use a languagetheoretic model for asynchronous shared-memory programs.

**Definition 1.** Let C be an (effective) full trio. An asynchronous program (AP) over <sup>C</sup> is a tuple <sup>P</sup> = (D, Σ,(Lc)<sup>c</sup>∈C, d0, **<sup>m</sup>**0), where <sup>D</sup> is a finite set of global states, <sup>Σ</sup> is an alphabet of handler names, (Lc)<sup>c</sup>∈<sup>C</sup> is a family of languages from <sup>C</sup>, one for each <sup>c</sup> <sup>∈</sup> <sup>C</sup> where <sup>C</sup> <sup>=</sup> <sup>D</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>D</sup> is the set of contexts, <sup>d</sup><sup>0</sup> <sup>∈</sup> <sup>D</sup> is the initial state, and **<sup>m</sup>**<sup>0</sup> <sup>∈</sup> <sup>M</sup>[Σ] is a multiset of initial pending handler instances.

<sup>A</sup> configuration (d, **<sup>m</sup>**) <sup>∈</sup> <sup>D</sup> <sup>×</sup> <sup>M</sup>[Σ] of <sup>P</sup> consists of a global state <sup>d</sup> and a multiset **m** of pending handler instances. For a configuration c, we write c.d and c.**m** for the global state and the multiset in the configuration respectively. The initial configuration c<sup>0</sup> of P is given by c0.d = d<sup>0</sup> and c0.**m** = **m**0. The semantics of P is given as a labeled transition system over the set of configurations, with the transition relation <sup>σ</sup> −→⊆ (<sup>D</sup> <sup>×</sup> <sup>M</sup>[Σ]) <sup>×</sup> (<sup>D</sup> <sup>×</sup> <sup>M</sup>[Σ]) given by

(d, **<sup>m</sup>** <sup>⊕</sup> <sup>σ</sup>) <sup>σ</sup> −→ (d , **<sup>m</sup>** <sup>⊕</sup> **<sup>m</sup>** ) iff <sup>∃</sup><sup>w</sup> <sup>∈</sup> <sup>L</sup>dσd-: Parikh(w) = **m**

We use →<sup>∗</sup> for the reflexive transitive closure of the transition relation. A configuration <sup>c</sup> is said to be reachable in <sup>P</sup> if (d0, **<sup>m</sup>**0) <sup>→</sup><sup>∗</sup> <sup>c</sup>.

Intuitively, the set Σ of handler names specifies a finite set of procedures that can be invoked asynchronously. The shared state takes values in D. When a handler is called asynchronously, it gets added to a bag of pending handler calls (the multiset **m** in a configuration). The language Ldσd captures the effect of executing an instance of σ starting from the global state d, such that on termination, the global state is d . Each word <sup>w</sup> <sup>∈</sup> <sup>L</sup>dσd captures a possible sequence of handlers posted during the execution.

Suppose the current configuration is (d, **m**). A non-deterministic scheduler picks one of the outstanding handlers <sup>σ</sup> <sup>∈</sup> **<sup>m</sup>** and executes it. Executing <sup>σ</sup> corresponds to picking one of the languages Ldσd and some word <sup>w</sup> <sup>∈</sup> <sup>L</sup>dσd- . Upon execution of σ, the new configuration has global state d and the new bag of pending calls is obtained by taking **m**, removing an instance of σ from it, and adding the Parikh image of w to it. This reflects the current set of pending handler calls—the old ones (minus an instance of σ) together with the new ones added by executing σ. Note that a handler is executed atomically; thus, we atomically update the global state and the effect of executing the handler.

Let us see some examples of asynchronous programs. It is convenient to present these examples in a programming language syntax, and to allow each

```
1 global var turn = ref 0 and x = ref 0;
2 let rec s1 () = if * then begin post a; s1(); post b end
3 let rec s2 () = if * then begin post a; s2(); post b end else post b
4 let a () = if !turn = 0 then begin turn := 1; x := !x + 1 end else post a
5 let b () = if !turn = 1 then begin turn := 0; x := !x - 1 end else post b
6
7 let s3 () = post s3; post s3
8
9 global var t = ref 0;
10 let c () = if !t = 0 then t := 1 else post c
11 let d () = if !t = 1 then t := 2 else post d
12 let f () = if !t = 2 then t := 0 else post f
13
14 let cc x = post c; x
15 let dd x = post d; x
16 let ff x = post f; x
17 let id x = x
18 let h g y = cc (g (dd y))
19 let rec produce g x = if * then produce (h g) (ff x) else g x
20 let s4 () = produce id ()
```
**Fig. 1.** Examples of asynchronous programs

handler to have internal actions that perform local tests and updates to the global state. As we describe informally below, and formally in the full version, when C is a full trio, internal actions can be "compiled away" by taking an intersection with a regular language of internal actions and projecting the internal actions away. Thus, we use our simpler model throughout.

Examples Figure 1 shows some simple examples of asynchronous programs in an OCaml-like syntax. Consider first the asynchronous program in lines 1–5. The alphabet of handlers is s1, s2, a, and b. The global states correspond to possible valuations to the global variables turn and x; assuming turn is a Boolean and <sup>x</sup> takes values in <sup>N</sup>, we have that <sup>D</sup> <sup>=</sup> {0, <sup>1</sup>}×{0, <sup>1</sup>, ω}, where <sup>ω</sup> abstracts all values other than {0, <sup>1</sup>}. Since s1 and s2 do not touch any variables, for d, d <sup>∈</sup> <sup>D</sup>, we have <sup>L</sup>d,s1,d <sup>=</sup> {a<sup>n</sup>b<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>0</sup>}, <sup>L</sup>d,s2,d <sup>=</sup> {a<sup>n</sup>b<sup>n</sup>+1 <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>0</sup>}, and Ld,s1,d- = Ld,s2,d-<sup>=</sup> <sup>∅</sup> if <sup>d</sup> <sup>=</sup> <sup>d</sup>.

For the languages corresponding to a and b, we use syntactic sugar in the form of internal actions; these are local tests and updates to the global state. For our example, we have, e.g., <sup>L</sup>(0,0),a,(1,1) <sup>=</sup> {ε}, <sup>L</sup>(1,x),a,(1,x) <sup>=</sup> {a} for all values of x, and similarly for b. The meaning is that, starting from a global state (0, 0), executing the handler will lead to the global state (1, 1) and no handlers will be posted, whereas starting from a global state in which turn is 1, executing the handler will keep the global state unchanged but post an instance of a. Note that all the languages are context-free.

Consider an execution of the program from the initial configuration ((0, 0), s1). The execution of s1 puts <sup>n</sup> <sup>a</sup>s and <sup>n</sup> <sup>b</sup>s into the bag, for some <sup>n</sup> <sup>≥</sup> 0. The global variable turn is used to ensure that the handlers <sup>a</sup> and <sup>b</sup> alternately update x. When turn is 0, the handler for a increments x and sets turn to 1, otherwise it re-posts itself for a future execution. Likewise, when turn is 1, the handler for b decrements x and sets turn back to 0, otherwise it re-posts itself for a future execution. As a result, the variable x never grows beyond 1. Thus, the program satisfies the safety property that no execution sets x to ω.

It is possible that the execution goes on forever: for example, if s1 posts an a and a b, and thereafter only b is chosen by the scheduler. This is not an "interesting" infinite execution as it is not fair to the pending a. In the case of a fair scheduler, which eventually always picks an instance of every pending task, the program terminates: eventually all the as and bs are consumed when they are scheduled in alternation. However, if instead we started with s2, the program will not terminate even under a fair scheduler: the last remaining b will not be paired and will keep executing and re-posting itself forever.

Now consider the execution of s3. It has an infinite fair run, where the scheduler picks an instance of s3 at each step. However, the number of pending instances grows without bound. We shall study the boundedness problem, which checks if the bag can become unbounded along some run. We also study a stronger notion of fair termination, called fair non-starvation, which asks that every instance of a posted handler is executed under any fair scheduler. The execution of s3 is indeed fair, but there can be a specific instance of s3 that is never picked: we say s3 can starve an instance.

The program in lines 9–20 is higher-order (produce and h take functions as arguments). The language of s4 is the set {c<sup>n</sup>d<sup>n</sup>f<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>0</sup>}, that is, it posts an equal number of cs, ds, and fs. It is an indexed language; we shall see (Section 5) how this and other higher-order programs can be represented using higher-order recursion schemes (HORS). Note the OCaml types of produce : (o → o) → o → o and h : (o → o) → o → o are higher-order.

The program is similar to the first: the handlers c, d, and f execute in "round robin" fashion using the global state t to find their turns. Again, we use internal actions to update the global state for readability. We ask the same decision questions as before: does the program ever reach a specific global state and does the program have an infinite (fair) run? We shall see later that safety and termination questions remain decidable, whereas fair termination does not.

### **3 Decision Problems on Asynchronous Programs**

We now describe decision problems on runs of asynchronous programs.

Runs, preruns, and downclosures <sup>A</sup> prerun of an AP <sup>P</sup> = (D, Σ,(Lc)<sup>c</sup>∈<sup>C</sup>, d0, **<sup>m</sup>**0) is a finite or infinite sequence ρ = (e0, **n**0), σ1,(e1, **n**1), σ2,... of alternating elements of tuples (ei, **<sup>n</sup>**i) <sup>∈</sup> <sup>D</sup> <sup>×</sup> <sup>M</sup>[Σ] and symbols <sup>σ</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup>. The set of preruns of P will be denoted Preruns(P). Note that if two asynchronous programs P and P have the same D and Σ, then Preruns(P) = Preruns(P ). The length, denoted <sup>|</sup>ρ|, of a finite prerun <sup>ρ</sup> is the number of configurations in <sup>ρ</sup>. The <sup>i</sup> th configuration of a prerun ρ will be denoted ρ(i).

We define an order on preruns as follows: For preruns ρ = (e0, **n**0), σ1,(e1, **n**1), σ2,... and ρ = (e <sup>0</sup>, **n** 0), σ 1,(e <sup>1</sup>, **n** 1), σ <sup>2</sup>,..., we define ρ ρ if <sup>|</sup>ρ<sup>|</sup> <sup>=</sup> <sup>|</sup>ρ <sup>|</sup> and <sup>e</sup><sup>i</sup> <sup>=</sup> <sup>e</sup> <sup>i</sup>, σ<sup>i</sup> = σ <sup>i</sup> and **n**<sup>i</sup> ( **n** <sup>i</sup> for each <sup>i</sup> <sup>≥</sup> 0. The downclosure <sup>↓</sup><sup>R</sup> of a set <sup>R</sup> of preruns of <sup>P</sup> is defined as <sup>↓</sup><sup>R</sup> <sup>=</sup> {<sup>ρ</sup> <sup>∈</sup> Preruns(P) | ∃ρ <sup>∈</sup> R. ρ <sup>ρ</sup> }.

<sup>A</sup> run of an AP <sup>P</sup> = (D, Σ,(Lc)<sup>c</sup>∈<sup>C</sup>, d0, **<sup>m</sup>**0) is a prerun <sup>ρ</sup> <sup>=</sup> (d0, **m**0), σ1,(d1, **m**1), σ2,... starting with the initial configuration (d0, **m**0), where for each <sup>i</sup> <sup>≥</sup> 0, we have (di, **<sup>m</sup>**i) <sup>σ</sup>i+1 −−−→ (di+1, **<sup>m</sup>**i+1). The set of runs of <sup>P</sup> is denoted Runs(P) and <sup>↓</sup>Runs(P) is its downclosure with respect to .

An infinite run c<sup>0</sup> <sup>σ</sup><sup>0</sup> −→ <sup>c</sup><sup>1</sup> <sup>σ</sup><sup>1</sup> −→ ... is fair if for all <sup>i</sup> <sup>≥</sup> 0, if <sup>σ</sup> <sup>∈</sup> <sup>c</sup>i.**<sup>m</sup>** then there is some <sup>j</sup> <sup>≥</sup> <sup>i</sup> such that <sup>c</sup><sup>j</sup> σ −→ <sup>c</sup>j+1. That is, whenever an instance of a handler is posted, some instance of the handler is executed later. Fairness does not preclude that a specific instance of a handler is never executed. An infinite fair run starves handler <sup>σ</sup> if there exists an index <sup>J</sup> <sup>≥</sup> 0 such that for each <sup>j</sup> <sup>≥</sup> <sup>J</sup>, we have (i) <sup>c</sup><sup>j</sup> .**m**(σ) <sup>≥</sup> 1 and (ii) whenever <sup>c</sup><sup>j</sup> σ −→ <sup>c</sup>j+1, we have <sup>c</sup><sup>j</sup> .**m**(σ) <sup>≥</sup> 2. In this case, even if the run is fair, a specific instance of σ may never be executed.

Now we give the definitions of the various decision problems.

**Definition 2 (Properties of finite runs).** The **Safety (Global state reachability)** problem asks, given an asynchronous program P and a global state <sup>d</sup><sup>f</sup> <sup>∈</sup> <sup>D</sup>, is there a reachable configuration <sup>c</sup> such that c.d <sup>=</sup> <sup>d</sup><sup>f</sup> ? If so, <sup>d</sup><sup>f</sup> is said to be reachable (in P) and unreachable otherwise. The **Boundedness (of the task buffer)** problem asks, given an asynchronous program P, is there an <sup>N</sup> <sup>∈</sup> <sup>N</sup> such that for every reachable configuration <sup>c</sup>, we have <sup>|</sup>c.**m**| ≤ <sup>N</sup>? If so, the asynchronous program P is bounded; otherwise it is unbounded. The **Configuration reachability** problem asks, given an asynchronous program P and a configuration c, is c reachable?

**Definition 3 (Properties of infinite runs).** All the following problems take as input an asynchronous program P. The **Termination** problem asks if all runs of P are finite. The **Fair Non-termination** problem asks if P has some fair infinite run. The **Fair Starvation** problem asks if P has some fair run that starves some handler.

Our main result in this section shows that many properties of an asynchronous program P only depend on the downclosure ↓Runs(P) of the set Runs(P) of runs of the program P. The proof is by induction on the length of runs. For any AP <sup>P</sup> = (D, Σ,(Lc)<sup>c</sup>∈<sup>C</sup>, d0, **<sup>m</sup>**0), we define the AP <sup>↓</sup><sup>P</sup> <sup>=</sup> (D, Σ,(↓Lc)<sup>c</sup>∈<sup>C</sup>, d0, **<sup>m</sup>**0), where <sup>↓</sup>L<sup>c</sup> is the downclosure of the language <sup>L</sup><sup>c</sup> under the subword order.

**Proposition 1.** Let <sup>P</sup> = (D, Σ,(Lc)<sup>c</sup>∈<sup>C</sup>, d0, **<sup>m</sup>**0) be an asynchronous program. Then ↓Runs(↓P) = ↓Runs(P). In particular, the following holds. (1) For every <sup>d</sup> <sup>∈</sup> <sup>D</sup>, <sup>P</sup> can reach <sup>d</sup> if and only if <sup>↓</sup><sup>P</sup> can reach <sup>d</sup>. (2) <sup>P</sup> is terminating if and only if ↓P is terminating. (3) P is bounded if and only if ↓P is bounded.

Intuitively, safety, termination, and boundedness is preserved when the multiset of pending handler instances is "lossy": posted handlers can get lost. This corresponds to these handlers never being scheduled by the scheduler. However, if a run demonstrates reachability of a global state, or non-termination, or unboundedness, in the lossy version, it corresponds also to a run in the original problem (and conversely). In contrast, simple examples show that configuration reachability, fair termination, and fair non-starvation properties are not preserved under downclosures.

### **4 General Decidability Results**

In this section, we characterize those full trios C for which particular problems for asynchronous programs over C are decidable. Our decision procedures will use the following theorem, summarizing the results from [12], as a subprocedure.

**Theorem 1 ([12]).** Safety, boundedness, configuration reachability, termination, fair non-termination, and fair non-starvation are decidable for asynchronous programs over regular languages.

#### **4.1 Safety and termination**

Our first main result concerns the problems of safety and termination.

**Theorem 2.** Let C be a full trio. The following are equivalent:


We begin with "(i)⇒(iii)". Let <sup>K</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> be given. We construct <sup>P</sup> <sup>=</sup> (D, Σ,(Lc)<sup>c</sup>∈C, d0, **<sup>m</sup>**0) such that **<sup>m</sup>**<sup>0</sup> <sup>=</sup> <sup>σ</sup>, <sup>D</sup> <sup>=</sup> {d0, d1}, <sup>L</sup><sup>d</sup>0,σ,d<sup>1</sup> <sup>=</sup> <sup>K</sup> and <sup>L</sup><sup>c</sup> <sup>=</sup> <sup>∅</sup> for <sup>c</sup> = (d0, σ, d1). We see that <sup>P</sup> can reach <sup>d</sup><sup>1</sup> iff <sup>K</sup> is non-empty. Next we show "(ii)⇒(iii)". Consider the alphabet <sup>Γ</sup> = (<sup>Σ</sup> ∪ {ε}) × {0, <sup>1</sup>} and the homomorphisms <sup>g</sup> : <sup>Γ</sup><sup>∗</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> and <sup>h</sup>: <sup>Γ</sup><sup>∗</sup> → {σ}<sup>∗</sup>, where for <sup>x</sup> <sup>∈</sup> <sup>Σ</sup> ∪ {ε}, we have <sup>g</sup>((x, i)) = <sup>x</sup> for <sup>i</sup> ∈ {0, <sup>1</sup>}, <sup>h</sup>((x, 1)) = <sup>σ</sup>, and <sup>h</sup>((x, 0)) = <sup>ε</sup>. If <sup>R</sup> <sup>⊆</sup> <sup>Γ</sup><sup>∗</sup> is the regular set of words in which exactly one position belongs to the subalphabet (<sup>Σ</sup> ∪ {ε}) × {1}, then the language <sup>K</sup> := <sup>h</sup>(g−1(K) <sup>∩</sup> <sup>R</sup>) belongs to <sup>C</sup>. Note that <sup>K</sup> is <sup>∅</sup> or {σ}, depending on whether <sup>K</sup> is empty or not. We construct <sup>P</sup> = (D, Σ,(Lc)<sup>c</sup>∈C, d0, **<sup>m</sup>**0) with <sup>D</sup> <sup>=</sup> {d0}, **<sup>m</sup>**<sup>0</sup> <sup>=</sup> <sup>σ</sup>, <sup>L</sup><sup>d</sup>0,σ,d<sup>0</sup> <sup>=</sup> <sup>K</sup> and all languages <sup>L</sup><sup>c</sup> <sup>=</sup> <sup>∅</sup> for <sup>c</sup> = (d0, σ, d0). Then <sup>P</sup> is terminating iff <sup>K</sup> is empty.

To prove "(iii)⇒(i)", we design an algorithm deciding safety assuming decidability of emptiness. Given asynchronous program P and state d as input, the algorithm consists of two semi-decision procedures: one which searches for a run of P reaching the state d, and the second which enumerates regular overapproximations P of P and checks the safety of P using Theorem 1. Each P consists of a regular language A<sup>c</sup> overapproximating L<sup>c</sup> for each context c of P. We use decidability of emptiness to check that <sup>L</sup><sup>c</sup> <sup>∩</sup> (Σ<sup>∗</sup> \ <sup>A</sup>c) = <sup>∅</sup> to ensure that <sup>P</sup> is indeed an overapproximation.

The algorithm clearly gives a correct answer if it terminates. Hence, we only have to argue that it always does terminate. Of course, if d is reachable, the first semi-decision procedure will terminate. In the other case, termination is due to the regularity of downclosures: if d is not reachable in P, then Proposition 1 tells us that <sup>↓</sup><sup>P</sup> cannot reach <sup>d</sup> either. But <sup>↓</sup><sup>P</sup> is an asynchronous program over regular languages; this means there exists a safe regular overapproximation and the second semi-decision procedure terminates.

Like the algorithm for safety, the algorithm for termination consists of two semi-decision procedures. By standard well-quasi-ordering arguments, an infinite run of an asynchronous program P is witnessed by a finite self-covering run. The first semi-decision procedure enumerates finite self-covering runs (trying to show non-termination). The second procedure enumerates regular asynchronous programs P that overapproximate P. As before, to check termination of P , it applies the procedure from Theorem 1. Clearly, the algorithm's answer is always correct. Moreover, it gives an answer for every input. If P does not terminate, it will find a self-covering sequence. If P does terminate, then Proposition 1 tells us that ↓P is a terminating finite-state overapproximation. This implies that the second procedure will terminate in that case.

Let us point out a particular example. The class L of languages of lossy channel systems is defined like the class of languages of WSTS with upward-closed sets of accepting configurations as in [13], except that we only consider lossy channel systems [1] instead of arbitrary Well-Structured Transition Systems (WSTS). Then L forms a full trio with decidable emptiness. Although downclosures of lossy channel languages are not effectively computable (an easy consequence of [27]), our algorithm employs Theorem 2 to decide safety and termination.

#### **4.2 Boundedness**

**Theorem 3.** Let C be a full trio. The following are equivalent:


Clearly, the construction for "(i)⇒(iii)" of Theorem 2 also works for "(i)⇒(ii)": P is unbounded iff K is infinite.

For the converse, we first note that if finiteness is decidable for C then so is emptiness. Given <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> from <sup>C</sup>, consider the homomorphism <sup>h</sup>: (<sup>Σ</sup> ∪ {λ})<sup>∗</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> with <sup>h</sup>(a) = <sup>a</sup> for every <sup>a</sup> <sup>∈</sup> <sup>Σ</sup> and <sup>h</sup>(λ) = <sup>ε</sup>. Then <sup>h</sup>−1(L) belongs to <sup>C</sup> and h−1(L) is finite if and only if L is empty: in the inverse homomorphism, λ can be arbitrarily inserted in any word. By Theorem 2, this implies that we can also decide safety. As a consequence of considering only full trios, it is easy to see that the problem of context reachability reduces to safety: a context ˆc = ( ˆ d, σ, ˆ ˆ d ) ∈ C is reachable in P if there is a reachable configuration ( ˆ d, **<sup>m</sup>**) in <sup>P</sup> with **<sup>m</sup>**(ˆσ) <sup>≥</sup> 1.

We now explain our algorithm for deciding boundedness of a given aysnchronous program <sup>P</sup> = (D, Σ,(Lc)<sup>c</sup>∈<sup>C</sup>, d0, **<sup>m</sup>**0). For every context <sup>c</sup>, we first check if L<sup>c</sup> is infinite (feasible by assumption). This paritions the set of contexts of P into sets I and F which are the contexts for which the corresponding language L<sup>c</sup> is infinite and finite respectively. If any context in I is reachable, then P is unbounded. Otherwise, all the reachable contexts have a finite language. For every finite language <sup>L</sup><sup>c</sup> for some <sup>c</sup> <sup>∈</sup> <sup>F</sup>, we explicitly find all the members of Lc. This is possible because any finite set A can be checked with L<sup>c</sup> for equality. <sup>L</sup><sup>c</sup> <sup>⊆</sup> <sup>A</sup> can be checked by testing whether <sup>L</sup><sup>c</sup> <sup>∩</sup> (Σ<sup>∗</sup> \ <sup>A</sup>) = <sup>∅</sup> and <sup>L</sup><sup>c</sup> <sup>∩</sup> (Σ<sup>∗</sup> \ <sup>A</sup>) effectively belongs to <sup>C</sup>. On the other hand, checking <sup>A</sup> <sup>⊆</sup> <sup>L</sup><sup>c</sup> just means checking whether <sup>L</sup><sup>c</sup> ∩ {w} <sup>=</sup> <sup>∅</sup> for each <sup>w</sup> <sup>∈</sup> <sup>A</sup>, which can be done the same way. We can now construct asynchronous program P which replaces all languages for contexts in <sup>I</sup> by <sup>∅</sup> and replaces those corresponding to <sup>F</sup> by the explicit description. Clearly P is bounded iff P is bounded (since no contexts from I are reachable) and the former can be decided by Theorem 1.

We observe that boundedness is strictly harder than safety or termination: There are full trios for which emptiness is decidable, but finiteness is undecidable, such as the languages of reset vector addition systems [11] (see [32] for a definition of the language class) and languages of lossy channel systems.

#### **4.3 Configuration reachability and liveness properties**

Theorems 2 and 3 completely characterize for which full trios safety, termination, and boundedness are decidable. We turn to configuration reachability, fair termination, and fair starvation. We suspect that it is unlikely that there is a simple characterization of those language classes for which the latter problems are decidable. However, we show that they are decidable for a limited range of infinite-state systems. To this end, we prove that decidability of any of these problems implies decidability of the others as well, and also implies the decidability of a simple combinatorial problem that is known to be undecidable for many expressive classes of languages.

Let <sup>Z</sup> ⊆ {a, <sup>b</sup>}<sup>∗</sup> be the language <sup>Z</sup> <sup>=</sup> {<sup>w</sup> ∈ {a, <sup>b</sup>}<sup>∗</sup> | |w|<sup>a</sup> <sup>=</sup> <sup>|</sup>w|b}. The <sup>Z</sup>intersection problem for a language class <sup>C</sup> asks, given a language <sup>K</sup> ⊆ {a, <sup>b</sup>}<sup>∗</sup> from <sup>C</sup>, whether <sup>K</sup> <sup>∩</sup> <sup>Z</sup> <sup>=</sup> <sup>∅</sup>. Informally, <sup>Z</sup> is the language of all words with an equal number of as and bs and the Z-intersection problem asks if there is a word in K with an equal number of as and bs.

**Theorem 4.** Let C be a full trio. The following statements are equivalent:


Moreover, if decidability holds, then <sup>Z</sup>-intersection is decidable for <sup>C</sup>.

We prove Theorem 4 by providing reductions among the three problems and showing that Z-intersection reduces to configuration reachability. We use diagrams similar to automata to describe asynchronous programs. Here, circles represent global states of the program and we draw an edge <sup>d</sup> <sup>d</sup> <sup>σ</sup>|<sup>L</sup> in case we have Ld,σ,d- = L in our asynchronous program P. Furthermore, we have Ld,σ,d- = ∅ whenever there is no edge that specifies otherwise. To simplify notation, we draw an edge d <sup>w</sup>|<sup>L</sup> −−→ <sup>d</sup> in an asynchronous program for a word <sup>w</sup> <sup>∈</sup> <sup>Σ</sup>∗, <sup>w</sup> <sup>=</sup> <sup>σ</sup><sup>1</sup> ...σ<sup>n</sup> with <sup>σ</sup>1,...,σ<sup>n</sup> <sup>∈</sup> <sup>Σ</sup>, to symbolize a sequence of states

$$
\widehat{(d)} \xrightarrow{\sigma\_1 | \{\varepsilon\}} \widehat{(2)} \xrightarrow{\sigma\_2 | \{\varepsilon\}} \cdots \xrightarrow{\sigma\_{n-1} | \{\varepsilon\}} \widehat{(n)} \xrightarrow{\sigma\_n | L} \widehat{(d)}
$$

which removes <sup>σ</sup>1,...,σ<sup>n</sup> from the task buffer and posts a multiset of handlers specified by L.

Proof of "(ii)⇒(i)" Given an asynchronous program <sup>P</sup> = (D, Σ,(Lc)c∈C, d0, **<sup>m</sup>**0) and a configuration (d<sup>f</sup> , **<sup>m</sup>**<sup>f</sup> ) <sup>∈</sup> <sup>D</sup> <sup>×</sup> <sup>M</sup>[Σ], we construct asynchronous program <sup>P</sup> as follows. Let <sup>z</sup> be a fresh letter and let **<sup>m</sup>**<sup>f</sup> <sup>=</sup> <sup>σ</sup>1,...,σ<sup>n</sup>. We obtain <sup>P</sup> from P by adding a new state d <sup>f</sup> and including the following edges:

$$
\widehat{\left(\widehat{d\_f}\right)} \xrightarrow{\mathbf{z}\sigma\_1 \cdots \sigma\_n |\{\mathbf{z}\}} \widehat{\left(\widehat{d'\_f}\right)} \mathsf{Dom}\left|\{\mathbf{z}\}\right|
$$

Starting from (d0, **<sup>m</sup>**<sup>0</sup> <sup>⊕</sup> <sup>z</sup>), the program <sup>P</sup> has a fair infinite run iff (d<sup>f</sup> , **<sup>m</sup>**<sup>f</sup> ) is reachable in P. The 'if' direction is obvious. Conversely, z has to be executed in any fair run ρ of P which implies that d <sup>f</sup> is reached by <sup>P</sup> in <sup>ρ</sup>. Since only <sup>z</sup> can be executed at d <sup>f</sup> in <sup>ρ</sup>, this means that the multiset is exactly **<sup>m</sup>**<sup>f</sup> when <sup>d</sup><sup>f</sup> is reached during ρ. Clearly this initial segment of ρ corresponds to a run of P which reaches the target configuration.

Proof of "(iii)⇒(ii)" We construct <sup>P</sup> = (D, Σ ,(L <sup>c</sup>)<sup>c</sup>∈C- , d0, **m** <sup>0</sup>) given P = (D, Σ,(Lc)<sup>c</sup>∈C, d0, **<sup>m</sup>**0) over <sup>C</sup> as follows. Let <sup>Σ</sup> <sup>=</sup> <sup>Σ</sup> ∪ {s}, where <sup>s</sup> is a fresh handler. Replace each edge

$$
\begin{array}{ccc}
\widehat{(d)} \stackrel{\sigma|L}{\longrightarrow} \widehat{(d)} & \text{by} & \widehat{(d)} \stackrel{\sigma|L \cup L\mathbf{s}}{\longrightarrow} \widehat{(d)} \mathsf{Tos}|\varepsilon \\
\end{array}
$$

at every state <sup>d</sup> <sup>∈</sup> <sup>D</sup>. Moreover, we set **<sup>m</sup>** <sup>0</sup> <sup>=</sup> **<sup>m</sup>**0⊕<sup>s</sup>, <sup>s</sup>. Then <sup>P</sup> has an infinite fair run that starves some handler if and only if P has an infinite fair run. From an infinite fair run ρ of P, we obtain an infinite fair run of P which starves s, by producing s while simulating ρ and consuming it in the loop. Conversely, from an infinite fair run ρ of P which starves some τ , we obtain an infinite fair run ρ of P by omitting all productions and consumptions of s and removing two extra instances of s from all configurations.

Proof of "(i)⇒(iii)" From <sup>P</sup> = (D, Σ,(Lc)<sup>c</sup>∈C, d0, **<sup>m</sup>**0) over <sup>C</sup>, for each subset <sup>Γ</sup> <sup>⊆</sup> <sup>Σ</sup> and <sup>τ</sup> <sup>∈</sup> <sup>Σ</sup>, we construct an asynchronous program <sup>P</sup>Γ,τ <sup>=</sup> (D , Σ ,(Lc)<sup>c</sup>∈C- , d <sup>0</sup>, **m** <sup>0</sup>) over C such that a particular configuration is reachable in PΓ,τ if and only if P has a fair infinite run ρΓ,τ , where Γ is the set of handlers that is executed infinitely often in ρΓ,τ and ρΓ,τ starves τ . Since there are only finitely many choices for Γ and τ , decidability of configuration reachability implies decidability of fair starvation. The idea is that run ρΓ,τ exists if and only if there exists a run

$$(d\_0, \mathbf{m}\_0) \xrightarrow{\sigma\_1} \cdots \xrightarrow{\sigma\_n} (d\_n, \mathbf{m}\_n) = (e\_0, \mathbf{n}\_0) \xrightarrow{\gamma\_1} (e\_1, \mathbf{n}\_1) \xrightarrow{\gamma\_2} \cdots \xrightarrow{\gamma\_k} (e\_k, \mathbf{n}\_k), \tag{1}$$

where <sup>k</sup> <sup>i</sup>=1{γi} <sup>=</sup> <sup>Γ</sup>, for each 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup> **<sup>n</sup>**<sup>i</sup> <sup>∈</sup> <sup>M</sup>[Γ], **<sup>m</sup>**<sup>n</sup> ( **<sup>n</sup>**k, and for each <sup>i</sup> <sup>∈</sup> {1,...,k} with <sup>γ</sup><sup>i</sup> <sup>=</sup> <sup>τ</sup> , we have **<sup>n</sup>**<sup>i</sup>−<sup>1</sup>(<sup>τ</sup> ) <sup>≥</sup> 2. In such a run, we call (d0, **<sup>m</sup>**0) <sup>σ</sup><sup>1</sup> −→ ··· <sup>σ</sup><sup>n</sup> −−→ (dn, **<sup>m</sup>**n) its first phase and (e0, **<sup>n</sup>**0) <sup>γ</sup><sup>1</sup> −→··· <sup>γ</sup><sup>k</sup> −→ (ek, **<sup>n</sup>**k) its second phase.

Let us explain how PΓ,τ reflects the existence of a run as in Eq. (1). The set <sup>Σ</sup> of handlers of <sup>P</sup>Γ,τ includes <sup>Σ</sup>, <sup>Σ</sup>¯ and <sup>Σ</sup>ˆ, where <sup>Σ</sup>¯ <sup>=</sup> {σ¯ <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>} and <sup>Σ</sup><sup>ˆ</sup> <sup>=</sup> {σ<sup>ˆ</sup> <sup>|</sup> <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>} are disjoint copies of <sup>Σ</sup>. This means, a multiset <sup>M</sup>[Σ ] contains multisets **<sup>m</sup>** <sup>=</sup> **<sup>m</sup>**⊕**m**¯ <sup>⊕</sup>**m**<sup>ˆ</sup> with **<sup>m</sup>** <sup>∈</sup> <sup>M</sup>[Σ], **<sup>m</sup>**¯ <sup>∈</sup> <sup>M</sup>[Σ¯], and **<sup>m</sup>**<sup>ˆ</sup> <sup>∈</sup> <sup>M</sup>[Σˆ]. A run of PΓ,τ simulates the two phases of ρ. While simulating the first phase, PΓ,τ keeps two copies of the task buffer, **m** and **m**¯ . The copying is easily accomplished by a homomorphism with <sup>σ</sup> <sup>→</sup> σσ¯ for each <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>. At some point, <sup>P</sup>Γ,τ switches into simulating the second phase. There, **m**¯ remains unchanged, so that it stores the value of **m**<sup>n</sup> in Eq. (1) and can be used in the end to make sure that **m**<sup>n</sup> ( **n**k.

Hence, in the second phase, PΓ,τ works, like P, only with Σ. However, whenever a handler <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> is executed, it also produces a task ˆσ. These handlers are used at the end to make sure that every <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup> has been executed at least once in the second phase. Also, whenever τ is executed, PΓ,τ checks that at least two instances of τ are present in the task buffer, thereby ensuring that τ is starved.

In the end, a distinguished final state allows PΓ,τ to execute handlers in Γ and <sup>Γ</sup>¯ simultaneously to make sure that **<sup>m</sup>**<sup>n</sup> ( **<sup>n</sup>**k. In its final state, <sup>P</sup>Γ,τ can execute handlers ˆ<sup>γ</sup> <sup>∈</sup> <sup>Γ</sup><sup>ˆ</sup> and <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup> (without creating new handlers). In the final configuration, there can be no ˆ<sup>σ</sup> with <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> \Γ, and there has to be exactly one <sup>γ</sup><sup>ˆ</sup> for each <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>. This guarantees that (i) each handler in <sup>Γ</sup> is executed at least once during the second phase, (ii) every handler executed in the second phase is from Γ, and (iii) **m**<sup>n</sup> contains only handlers from Γ (because handlers from Σ¯ cannot be executed in the second phase).

Decidability of Z-intersection To complete the proof of Theorem 4, we reduce <sup>Z</sup>-intersection to configuration reachability. Given <sup>K</sup> ⊆ {a, <sup>b</sup>}<sup>∗</sup> from <sup>C</sup>, we construct the asynchronous program <sup>P</sup> = (D, Σ,(Lc)<sup>c</sup>∈C, d0, **<sup>m</sup>**0) over <sup>C</sup> where <sup>D</sup> <sup>=</sup> {d0, <sup>0</sup>, <sup>1</sup>}, <sup>Σ</sup> <sup>=</sup> {a, <sup>b</sup>, <sup>c</sup>}, by including the following edges:

The initial task buffer is **<sup>m</sup>**<sup>0</sup> <sup>=</sup> <sup>c</sup>. Then clearly, the configuration (0, -) is reachable in <sup>P</sup> if and only if <sup>K</sup> <sup>∩</sup> <sup>Z</sup> <sup>=</sup> <sup>∅</sup>.

Theorem 4 is useful in the contrapositive to show undecidability. For example, one can show undecidability of Z-intersection for languages of lossy channel systems (see Section 4.1): One expresses reachability in a non-lossy FIFO system by making sure that the numbers of enqueue- and dequeue-operations match. Thus, for asynchronous programs over lossy channel systems, the problems of Theorem 4 are undecidable. We also use Theorem 4 in Section 5 to conclude undecidability for higher-order asynchronous programs, already at order 2.

### **5 Higher-Order Asynchronous Programs**

We apply our general decidability results to asynchronous programs over (deterministic) higher-order recursion schemes (HORS). Kobayashi [21] has shown how higher-order functional programs can be modeled using HORS. In his setting, a program contains instructions that access certain resources. For Kobayashi, the path language of the HORS is the set of possible sequences of instructions. For us, the input program contains post instructions and we translate higher-order programs with post instructions into a HORS whose path language is used as the language of handlers.

We recall some definitions from [21]. The set of types is defined by the grammar <sup>A</sup> := <sup>o</sup> <sup>|</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup>. The order ord(A) of a type <sup>A</sup> is inductively defined as ord(o) = 0 and ord(<sup>A</sup> <sup>→</sup> <sup>B</sup>) := max(ord(A)+1, ord(B)). The arity of a type is inductively defined by arity(o) = 0 and arity(<sup>A</sup> <sup>→</sup> <sup>B</sup>) = arity(B) + 1. We assume a countably infinite set Var of typed variables x : A. For a set Θ of typed symbols, the set Θ˜ of terms generated from Θ is the least set which contains Θ such that whenever <sup>s</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> and <sup>t</sup> : <sup>A</sup> belong to <sup>Θ</sup>˜, then also s t : <sup>B</sup> belongs to <sup>Θ</sup>˜. By convention the type <sup>o</sup> <sup>→</sup> ...(<sup>o</sup> <sup>→</sup> (<sup>o</sup> <sup>→</sup> <sup>o</sup>)) is written <sup>o</sup> <sup>→</sup> ... <sup>→</sup> <sup>o</sup> <sup>→</sup> <sup>o</sup> and the term ((t1t2)t<sup>3</sup> ···)t<sup>n</sup> is written <sup>t</sup>1t<sup>2</sup> ···tn. We write ¯<sup>x</sup> for a sequence (x1, x2,...,xn) of variables.

A higher-order recursion scheme (HORS) is a tuple <sup>S</sup> = (Σ, <sup>N</sup> , <sup>R</sup>, S) where <sup>Σ</sup> is a set of typed terminal symbols of types of order 0 or 1, <sup>N</sup> is a set of typed non-terminal symbols (disjoint from terminal symbols), S : o is the start non-terminal symbol and <sup>R</sup> is a set of rewrite rules F x1x<sup>2</sup> ··· <sup>x</sup><sup>n</sup> <sup>t</sup> where <sup>F</sup> : <sup>A</sup><sup>1</sup> → ··· → <sup>A</sup><sup>n</sup> <sup>→</sup> <sup>o</sup> is a non-terminal in <sup>N</sup> , <sup>x</sup><sup>i</sup> : <sup>A</sup><sup>i</sup> for all <sup>i</sup> are variables and <sup>t</sup> : <sup>o</sup> is a term generated from <sup>Σ</sup> ∪N ∪ Var. The order of a HORS is the maximum order of a non-terminal symbol. We define a rewrite relation on terms over <sup>Σ</sup> ∪ N as follows: Fa¯ <sup>t</sup>[¯x/a¯] if Fx¯ <sup>t</sup> ∈ R, and if <sup>t</sup> <sup>t</sup> then ts t s and st st . The reflexive, transitive closure of is denoted ∗. A sentential form <sup>t</sup> of <sup>S</sup> is a term over <sup>Σ</sup> ∪ N such that <sup>S</sup> <sup>∗</sup> <sup>t</sup>.

If N is the maximum arity of a symbol in Σ, then a (possibly infinite) tree over <sup>Σ</sup> is a partial function tr from {0, <sup>1</sup>,...,N <sup>−</sup> <sup>1</sup>}<sup>∗</sup> to <sup>Σ</sup> that fulfills the following conditions: <sup>ε</sup> <sup>∈</sup> dom(tr), dom(tr) is closed under prefixes, and if tr(w) = <sup>a</sup> and arity(a) = <sup>k</sup> then {<sup>j</sup> <sup>|</sup> wj <sup>∈</sup> dom(tr)} <sup>=</sup> {0, <sup>1</sup>,...,k <sup>−</sup> <sup>1</sup>}.

A deterministic HORS is one where there is exactly one rule of the form F x1x<sup>2</sup> ··· <sup>x</sup><sup>n</sup> <sup>→</sup> <sup>t</sup> for every non-terminal <sup>F</sup>. Following [21], we show how a deterministic HORS can be used to represent a higher-order pushdown language arising from a higher-order functional program.

Sentential forms can be seen as ranked trees over <sup>Σ</sup> ∪N ∪Var. A sequence <sup>Π</sup> over {0, <sup>1</sup>,...,n−1} is a path of tr if every finite prefix of <sup>Π</sup> <sup>∈</sup> dom(tr). The set of paths in a tree tr will be denoted Paths(tr). Note that we are only interested in finite paths in our context. Associated with any path Π = n1, n2,...,n<sup>k</sup> is the word <sup>w</sup><sup>Π</sup> <sup>=</sup> tr(n1)tr(n1n2)···tr(n1n<sup>2</sup> ··· <sup>n</sup>k). Let <sup>Σ</sup><sup>1</sup> := {<sup>a</sup> <sup>∈</sup> <sup>Σ</sup> <sup>|</sup> arity(a)=1}. The path language <sup>L</sup>p(<sup>S</sup> ) of a deterministic HORS <sup>S</sup> is defined as {Proj<sup>Σ</sup><sup>1</sup> (wΠ) <sup>|</sup> <sup>Π</sup> <sup>∈</sup> Paths(T<sup>S</sup> )}. The tree language <sup>L</sup>t(<sup>S</sup> ) associated with a HORS is the set of finite trees over Σ generated by S .

The deterministic HORS corresponding to the higher-order function s3 from Figure <sup>1</sup> is given by <sup>S</sup> = (Σ, <sup>N</sup> , <sup>R</sup>, S), where

$$\begin{aligned} \Sigma &= \{ \mathbf{b} \mathbf{r} : \mathbf{o} \xrightarrow{\circ} \mathbf{o} \to \mathbf{o}, \mathbf{c}, \mathbf{d}, \mathbf{f} : \mathbf{o} \xrightarrow{\circ} \mathbf{o}, \mathbf{e} : \mathbf{o} \} \\ \mathcal{N} &= \{ S : \mathbf{o}, F : (\mathbf{o} \to \mathbf{o}) \to \mathbf{o} \to \mathbf{o}, H : (\mathbf{o} \to \mathbf{o}) \to \mathbf{o} \to \mathbf{o}, I : \mathbf{o} \to \mathbf{o} \} \\ \mathcal{R} &= \{ S \to F \mid \mathbf{e}, I \ x \xrightarrow{\circ} x, F \ G \ x \to \mathbf{b} \mathbf{r} (F \ (H \ G) \ (\mathbf{f} \ x)) \ (G \ x), \\ H \ G \ x &\twoheadrightarrow \mathbf{c} (G (\mathbf{d} \ x)) \} \end{aligned}$$

The path language <sup>L</sup>p(<sup>S</sup> ) = {cndnf<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>0</sup>}. To see this, apply the reduction rules to get the value tree T<sup>S</sup> shown on the right:

$$\begin{array}{lcll} S \rightarrow F \begin{array}{lcll} S \rightarrow F \ I \ \mathsf{e} \rightarrow \mathsf{br} \end{array} & \begin{pmatrix} F \ (HI) \ (\mathsf{fe}) \end{pmatrix} \begin{pmatrix} \mathsf{fe} \end{pmatrix} \\ \rightarrow \mathsf{br} \left( F \ (HI) \ (\mathsf{fe}) \right) \texttt{e} \\ \rightarrow \mathsf{br} \left( \mathsf{br} \left( F \ (H^{2}I) \left( \mathsf{f}^{2}e \right) \right) \left(HI \right) \left( \mathsf{fe} \right) \right) \texttt{e} \\ \rightarrow \mathsf{br} \left( \mathsf{br} \left( F \ (H^{2}I) \left( \mathsf{f}^{2}e \right) \right) \texttt{e} \left( I \ (\mathsf{df}\mathsf{e}) \right) \right) \texttt{e} \\ \rightarrow \mathsf{br} \left( \mathsf{br} \left( F \ (H^{2}I) \left( \mathsf{f}^{2}e \right) \right) \texttt{e} \left( I \ (\mathsf{df}\mathsf{e}) \right) \right) \texttt{e} \\ \rightarrow \mathsf{br} \left( \mathsf{br} \left( F \ (H^{2}I) \left( \mathsf{f}^{2}e \right) \right) \texttt{c} \texttt{f} \texttt{f} \texttt{e} \\ \rightarrow \cdots \\ \end{array} \end{array} \begin{array}{l} \mathsf{br} \left( \mathsf{br} \left( \mathsf{br} \left( \mathsf{b} \mathsf{r} \right) \right) \texttt{e} \\ \stackrel{\mathsf{\$$

A HORS S is called a word scheme if it has exactly one nullary terminal symbol e and all other terminal symbols Σ˜ are of arity one. The word language <sup>L</sup>w(<sup>S</sup> ) <sup>⊆</sup> <sup>Σ</sup>˜<sup>∗</sup> defined by <sup>S</sup> is <sup>L</sup>w(<sup>S</sup> ) = {a1a<sup>2</sup> ··· <sup>a</sup><sup>n</sup> <sup>|</sup> (a1(a<sup>2</sup> ···(an(e))···)) <sup>∈</sup> Lt(S )}. We denote by H the class of languages Lw(S ) that occur as the word language of a higher-order recursion scheme S . Note that path languages and languages of word schemes are both word languages over the set Σ˜ of unary symbols considered as letters. They are connected by the following proposition.<sup>2</sup>

**Proposition 2.** For every order-<sup>n</sup> HORS <sup>S</sup> = (Σ, <sup>N</sup> , S, <sup>R</sup>) there exists an order-n word scheme S = (Σ , <sup>N</sup> , S , R ) such that Lp(S ) = Lw(S ).

A consequence of [21] and Prop. 2 is that the "post" language of higher-order functional programs can be modeled as the language of a word scheme. Hence, we define an asynchronous program over HORS as an asynchronous program over the language class H and we can use the following results on word schemes.

**Theorem 5.** HORS and word schemes form effective full trios [7]. Emptiness [23] and finiteness [29] of order-<sup>n</sup> word schemes are (<sup>n</sup> <sup>−</sup> 1)-EXPTIME-complete.

Now Theorems 2 and 3, together with Proposition 2 imply the decidability results in Corollary 1. The undecidability result is a consequence of Theorem 4 and the undecidability of the Z-intersection problem for indexed languages or equivalently, order-2 pushdown automata as shown in [33]. Order-2 pushdown automata can be effectively turned into order-2 OI grammars [10], which in turn can be translated into order-2 word schemes [9]. See also [22, Theorem 4].

**Corollary 1.** For asynchronous programs over HORS: (1) Safety, termination, and boundedness are decidable. (2) Configuration reachability, fair termination, and fair starvation are undecidable already at order-2.

**A Direct Algorithm** We say that downclosures are computable for a language class <sup>C</sup> if for a given description of a language <sup>L</sup> in <sup>C</sup>, one can compute an automaton for the regular language <sup>↓</sup>L. From Proposition <sup>1</sup> and Theorem 1,

<sup>2</sup> The models of HORS (used in model checking higher order programs [21]) and word schemes (used in language-theoretic exploration of downclosures [15,7]) are somewhat different. Thus, we show an explicit reduction between the two formalisms.

if one can compute downclosures for a language class, then one can avoid the enumerative approaches of Section 4 and get a "direct algorithm." The algorithm replaces each handler by its downclosure and then invokes the decision procedure summarized in Theorem 1. The direct algorithm for asynchronous programs over HORS relies on the recent breakthrough results on computing downclosures.

# **Theorem 6 ([33,15,7]).** Downclosures are effectively computable for H.

Unfortunately, current techniques for computing downclosures do not yet provide a complexity upper bound as we describe below. In [33], it was shown that in a full trio, downclosures are computable if and only if the diagonal problem for C is decidable. The latter asks, given a language <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup>∗, whether for every <sup>k</sup> <sup>∈</sup> <sup>N</sup>, there is a word <sup>w</sup> <sup>∈</sup> <sup>L</sup> with <sup>|</sup>w|<sup>σ</sup> <sup>≥</sup> <sup>k</sup> for every <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>. The diagonal problem was then shown to be decidable for higher-order pushdown automata [15] and then for word schemes [7]. The algorithm from [33] to compute downclosures using an oracle for the diagonal problem employs enumeration to compute a downclosure automaton, thus we have hidden the enumeration into the downclosure computation. We conjecture that downclosures can be computed in elementary time for word schemes of fixed order. This would imply an elementary time procedure for asynchronous programs over HORS of fixed order.

For handlers over context-free languages, given as PDAs, Ganty and Majumdar [12] show an EXPSPACE upper bound for safety, termination, and boundedness. Their algorithm constructs for each handler a polynomial-size Petri net with certain guarantees (forming so-called adequate family of Petri nets) that accepts a Parikh equivalent language. These Petri nets are then used to construct a larger Petri net, polynomial in the size of the asynchronous program and the adequate family of Petri nets, in which safety, termination, or boundedness can be phrased as a query decidable in EXPSPACE.

A natural question is whether a downclosure-based algorithm matches the same complexity. We can replace the Parikh-equivalent Petri nets of [12] with Petri nets recognizing the downclosure of a language. It is an easy consequence of Proposition 1 that the resulting Petri nets can be used in place of the adequate families of Petri nets in the procedures for safety, termination, and boundedness of [12]. Unfortunately, a finite automaton for <sup>↓</sup><sup>L</sup> may require exponentially many states in the PDA [4], so a naive approach gives a 2EXPSPACE algorithm.

In the full version of this paper, we show that that for each context-free language L, one can construct in polynomial time a 1-bounded Petri net accepting <sup>↓</sup>L. (Recall that a 1-bounded Petri net if every reachable marking has at most one token in each place.) When used in the construction of [12], this matches the EXPSPACE upper bound for safety, termination, and boundedness verification.

As a byproduct, we get a simple direct construction of a finite automaton for <sup>↓</sup><sup>L</sup> when <sup>L</sup> is given as a PDA. This is of independent interest because earlier constructions of <sup>↓</sup><sup>L</sup> always start from a context-free grammar and produce (necessarily!) exponentially large NFAs [24,8,4]. The key observation is that the downclosure of the language of a PDA can be represented, after some simple modifications, as the language accepted by the PDA with a bounded stack.

### **References**


or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. **Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Author Index

Abate, Alessandro I-370 Abbasi, Rosa II-242 Ádám, Zsófia II-433 Agrawal, Sakshi II-458 Ahmed, Daniele I-370 Ahrendt, Wolfgang II-242 Alur, Rajeev I-430 Amir, Guy II-203 André, Étienne I-311 Andrianov, Pavel II-423 Andriushchenko, Roman I-191 Apinis, Kalmer II-438 Arias, Jaime I-311 Ashok, Pranav II-326

Backenköhler, Michael I-210 Baek, Seulkee I-59 Bansal, Suguman I-20 Barrett, Clark I-113, II-145, II-203 Bendík, Jaroslav I-291 Beneš, Nikola II-64 Beyer, Dirk II-401 Biere, Armin I-133, II-357 Biewer, Sebastian II-365 Bisping, Benjamin I-3 Blondin, Michael II-3 Bonakdarpour, Borzoo I-94 Bortolussi, Luca I-210 Brim, Luboš II-64 Bryant, Randal E. I-76 Budde, Carlos E. II-373

Carneiro, Mario I-59 Černá, Ivana I-291 Češka, Milan I-191 Chalupa, Marek II-453 Chatterjee, Krishnendu I-20 Chattopadhyay, Agnishom I-330 Chen, Ran II-262 Christakis, Maria II-43 Cohen, Aviad II-87

Darke, Priyanka II-458 Darulova, Eva II-43, II-242

Erhard, Julian II-438 Ernst, Gidon II-24

Fedyukovich, Grigory II-24 Felgenhauer, Bertram II-127 Ferreira, Margarida I-152 Finkbeiner, Bernd II-365 Furuse, Jun II-262

Ganesh, Vijay II-303 Gieseking, Manuel II-381 Giesl, Jürgen I-250 Gol, Ebru Aydin I-291 Gorostiaga, Felipe II-349 Griggio, Alberto I-113 Großmann, Gerrit I-210

Haas, Thomas II-428 Haase, Christoph II-3 Hajdu, Ákos II-433 Hark, Marcel I-250 Hartmanns, Arnd II-373 Hausmann, Daniel I-38 Hecking-Harbusch, Jesko II-381 Hermanns, Holger II-365, II-389 Heule, Marijn J. H. I-59, I-76, II-223 Hojjat, Hossein II-443 Howar, Falk II-448 Hsu, Tzu-Han I-94 Huang, Cheng-Chao I-389

Igarashi, Atsushi II-262 Irfan, Ahmed I-113

Jackermeier, Mathias II-326 Jašek, Tomáš II-453

Jeangoudoux, Clothilde II-43 Junges, Sebastian I-173, I-191

Katoen, Joost-Pieter I-173, I-191, I-230 Katz, Guy II-203 Kaufmann, Daniela II-357 Kawata, Akira II-262 Khoroshilov, Alexey II-423 Klauck, Michaela II-389 Köhl, Maximilian A. II-365, II-389 Křetínský, Jan II-326

Lam, Wing I-270 Lepiller, Julien II-105 Li, Jianlin I-389 Li, Renjue I-389 Li, Yahui I-430 Lochmann, Alexander II-127 Lohar, Debasmita II-43 Loo, Boon Thau I-430 Lynce, Inês I-152

Majumdar, Rupak I-449 Mamouras, Konstantinos I-330 Mann, Makai I-113 Marinov, Darko I-270 Martins, Ruben I-152 Meyer, Fabian I-250 Meyer, Roland II-428 Middeldorp, Aart II-127 Mitterwallner, Fabian II-127 Mues, Malte II-448 Mutilin, Vadim II-423 Myreen, Magnus O. II-223

Nadel, Alexander II-87 Nejati, Saeed II-303 Nestmann, Uwe I-3 Niemetz, Aina II-145, II-303 Nishida, Yuki II-262 Novák, Jakub II-453

Offtermatt, Philip II-3 Osama, Muhammad I-133

Padon, Oded I-113 Pastva, Samuel II-64 Peruffo, Andrea I-370 Petrucci, Laure I-311 Piskac, Ruzica II-105

Platzer, André II-181 Pol, Jaco van de I-311 Ponce-de-León, Hernán II-428 Preiner, Mathias II-145, II-303 Quatmann, Tim I-230 Řechtáčková, Anna II-453 Reger, Giles II-164 Reynolds, Andrew II-145 Rümmer, Philipp II-443 Ryvchin, Vadim II-87 Saan, Simmo II-438 Šafránek, David II-64 Saito, Hiromasa II-262 Sallai, Gyula II-433 Sánchez, César I-94, II-349 Santolucito, Mark II-105 Schäf, Martin II-105 Schiffl, Jonas II-242 Schmid, Stefan I-411 Schnepf, Nicolas I-411 Schnitzer, Yannik II-365 Schoisswohl, Johannes II-164 Schröder, Lutz I-38 Schwarz, Michael II-438 Schwenger, Maximilian II-365 Scott, Joseph II-303 Seidl, Helmut II-438 Sencan, Ahmet I-291 Shamakhi, Ali II-443 Shi, Lei I-430 Sobel, Joshua II-43 Šoková, Veronika II-453 Sotoudeh, Matthew II-281 Spel, Jip I-173 Srba, Jiří I-411 Strejček, Jan II-453 Suenaga, Kohei II-262 Sun, Jun I-389

Tan, Yong Kiam II-181, II-223 Terra-Neves, Miguel I-152 Thakur, Aditya V. II-281 Thinniyam, Ramanathan S. I-449 Tinelli, Cesare II-145

Ulbrich, Mattias II-242

Vardi, Moshe Y. I-20 Venkatesh, R. II-458 Ventura, Miguel I-152 Vogler, Ralf II-438 Vojdani, Vesal II-438 Voronkov, Andrei II-164

Wang, Jingyi I-389 Wang, Zhifu I-330 Wei, Anjiang I-270 Weinhuber, Christoph II-326 Weininger, Maximilian II-326 Weiss, Gail I-351 Wijs, Anton I-133

Wolf, Verena I-210 Wu, Haoze II-203 Xie, Tao I-270 Xue, Bai I-389 Yadav, Mayank II-326 Yang, Pengfei I-389 Yanich, Ann II-381 Yellin, Daniel M. I-351

Zetzsche, Georg I-449 Zhang, Lijun I-389

Yi, Pu I-270